d-functions of individual elements are:

Optimal Structure of Multi-state Systems with Uncovered Failures

GREGORY LEVITIN, SENIOR MEMBER IEEE

ABSTRACT - THE PAPER PRESENTS A MODEL OF MULTI-STATE SYSTEM IN WHICH SOME GROUPS OF ELEMENTS CAN BE AFFECTED BY UNCOVERED FAILURES THAT CAUSE OUTAGES OF THE ENTIRE GROUP. IT SHOWS THAT THE MAXIMAL RELIABILITY OF SUCH SYSTEM CAN BE ACHIEVED BY PROPER BALANCE BETWEEN TWO TYPES OF TASK PARALLELIZATION: PARALLEL TASK EXECUTION WITH WORK SHARING AND REDUNDANT TASK EXECUTION. A PROCEDURE FOR FINDING THE OPTIMAL BALANCE FOR SERIES-PARALLEL MULTI-STATE SYSTEMS IS SUGGESTED. THIS PROCEDURE IS BASED ON A UNIVERSAL GENERATING FUNCTION TECHNIQUE AND A GENETIC ALGORITHM. ILLUSTRATIVE EXAMPLES ARE PRESENTED.

Index Terms – multi-state system, uncovered failures, universal generating function, genetic algorithm

Nomenclature

1(x) unity function: 1(TRUE) = 1, 1(FALSE) = 0

C* minimum allowed MSS capacity

e element (subsystem) performance in the case of failure

Em number of parallel elements in subsystem m

f(V, (*) acceptability function

Gj random performance of system element j

gj set of possible realizations of Gj

gjh h-th realization of Gj

M number of subsystems connected in series

n total number of system elements

pjh Pr{Gj = gjh}

Pr{e} probability of event e

Qi Pr{V = vi }

R((*) system reliability: Pr{f(V, (*)=1}

T* maximum allowed MSS task execution time

[pic] u-function representing pmf of Gj

[pic] reduced u-function representing pmf of Gj except the state of uncovered failure

[pic] u-function representing pmf of performance of subsystem consisting of elements belonging to set (

U(z) u-function representing pmf of V

V random system performance

vi i-th realization of V

( {(1, …, (M} vector of elements distribution among the WSG

( system structure function: V=[pic]

(m set of elements belonging to subsystem m

(mi set of elements belonging to i-th WSG of subsystem m

|(mi| cardinality of set (mi

(, (, (, ( performance composition functions (functions that determine performance of pairs of elements)

(* system demand

[pic] composition operator over u-functions

Acronyms & abbreviations[1]

GA genetic algorithm

MSS multi-state system

pmf probability mass function

u-function universal generating function

WSG work sharing group (group of elements affected by uncovered failures)

I. Introduction

Fault-tolerant system design is aimed at preventing the entire system failure even when some of its elements fail. Usually the fault tolerance is implemented by providing sufficient redundancy and by using automatic recovery and reconfiguration mechanisms. However some failures can remain undetected or uncovered, which can lead to the total failure of the entire system or its subsystems. Examples of this effect of uncovered faults can be found in computing systems, electrical power distribution networks etc. [1].

The systems with imperfect fault coverage have been intensively studied in [1-6]. It was shown that the system reliability can decrease with increase in redundancy over some particular limit if the system is subjected to imperfect fault coverage [2]. As a result the system structure optimization problems arise. Some of these problems have been formulated and solved for binary parallel and k-out-of-n systems [4, 5].

In many cases the system and its elements can function at different states characterized by different levels of performance. Such systems are referred to as multi-state systems (MSS). MSS can also be subjected to the uncovered faults that lead to the total failure of the entire system or its subsystems.

Consider for example a multi-channel data transmission system in which data packages are divided into sub-packages transmitted through different channels. This method allows the work to be shared among the channels, which results in accelerating the data transmission process. If some channels fail, the automatic data exchange management system is able to distribute the transmitted data among the available channels. In this case the system performance (bandwidth) lowers, but the system remains operating. However, when a failure of any channel remains undetected during the system mission time, the management system cannot make the proper reconfiguration (cover the failure) and still assigns some sub-package to the unavailable channel. Therefore, some information is lost and the entire data transmission task fails.

MSS with imperfect failure coverage was first considered in [6], where an algorithm for evaluating system reliability based on ordered binary decision diagram has been suggested. This algorithm is based on formulating combinatorial performance requirements of a system being operational at any performance level, which makes it difficult for implementation. It also assumes that the entire system fails in the case of any uncovered failure, whereas in many real systems the uncovered failures cause failures only of some groups of elements.

In many cases two types of task parallelization are possible in the MSS [7]:

1. Parallel task execution with work sharing: the entire system task is divided into subtasks executed by the parallel elements. This type of parallelization allows the subsystem performance to be considerably improved. On the other hand, undetected failures of some elements remain uncovered since some portions of work are not completed.

2. Parallel task execution without work sharing (redundant task execution): the same entire task is executed by all of the parallel elements. This type of parallelization provides high task reliability since failures of any group of elements are covered in the system until at least one of its elements remains operable. However, the absence of work sharing reduces the system performance.

In the MSS reliability is considered to be a measure of the ability of the system to provide a required performance level [7]. For example in applications where the execution time of each task is of critical importance, the system reliability R(T*) is defined (according to performability concept [8]) as a probability that the task is completed in time less than T*. In applications where the system productivity/capacity is of interest, the system reliability R(C*) is defined as the probability that its capacity is greater than C*.

Since in the MSS the system performance and reliability are interconnected, the greatest reliability of MSS with uncovered failures can be achieved by providing proper balance between the two types of task parallelization.

An efficient approach for reliability analysis of complex multi-state systems is based on a universal generating function (UGF) technique [9-11]. This approach allows obtaining the performance distribution of complex MSS using a generalized reliability block diagram method (recursive aggregating multi-state elements and replacing them by single equivalent ones) [10]. This paper suggests a UGF-based algorithm for evaluating reliability of series-parallel MSS with uncovered failures and an optimization procedure aimed at obtaining the greatest system reliability by optimal combination of the two types of task parallelization.

II. Model description and problem formulation

Consider a system consisting of M subsystems connected in series. Each subsystem contains different elements connected in parallel. The total number of parallel elements in each subsystem m is Em.

Any system element j can have kj+1 different states corresponding to the performance rates, represented by the set gj={gj0, gj1,…,[pic]}, where [pic] is the performance rate of element j in the state h, [pic]. The performance rate Gj of element j at any time instant is a random variable that takes its values from gj: Gj[pic]gj. The probabilities associated with the different states of the system element j can be represented by the set

[pic] (1)

where

pjh = Pr{Gj = gjh}. (2)

The pmf of performance of any system element j can be represented by pair of vectors gj, pj. Since the element's states compose the complete group of mutually exclusive events (meaning that the element can always be in one and only in one of kj+1 states)

[pic] (3)

It is assumed that the states of MSS elements are mutually independent.

According to [6] the undetected failures can be modeled by assuming that state 0 of each MSS element corresponds to undetected failure (if the undetected failure of element is impossible, the corresponding state probability should be zeroed: pj0=0). The element performance in the state of undetected failure is gj0.

The elements belonging to the same subsystem can be separated into independent work sharing groups (WSG). The number of WSG in a subsystem m can vary from 1 (all the elements belong to the same group) to Em (each element constitutes a separate group). The available elements belonging to the same WSG perform their task by sharing the work in the optimal way (providing the greatest performance of the entire group). In the case of detected failures of some elements the resource management system is able to redistribute the task among the available elements. Undetected failure of any element belonging to a WSG cannot be covered within this WSG and causes the failure of the entire group.

Different WSG belonging to the same subsystem perform the same task in parallel providing the task execution redundancy.

At each moment, the system elements have certain performance rates corresponding to their states. The performance rate of the entire system is unambiguously determined by its structure and by the performance rates of its elements. Assume that the entire system has K+1 different states and that vi is the entire system performance rate in state i [pic]{0, …, K}. The MSS performance rate is a random variable V that takes values from the set {v0, …, vK}. The system structure function V=[pic] which maps the space of the elements' performance rates into the space of system's performance rates, is determined by the system structure. In our model the system structure function is affected by the distribution of elements among WSG in each subsystem.

The elements distribution among WSG in each component m can be considered as a problem of partitioning a set (m of Em items into a collection of Em mutually disjoint subsets (mi, i.e. such that

[pic] (4)

[pic]ø, i(j. (5)

Each set can contain from 0 to Em elements. The partition of the set (m can be represented by the vector (m={(mj, 1(j(Em}, where (mj is the index of the subset to which element j belongs.

Concatenation of vectors ( = {(1, …, (M} determines the distribution of elements among the WSG for the entire system. For any given ( and given pmf of the system elements one can obtain the pmf of the entire system performance V in the form

Qi, vi, 0( i (K, where Qi = Pr{V = vi }. (6)

The acceptability of system state can usually be defined by the acceptability function f(V, (*) representing the desired relation between the system performance V and some limit value named system demand ( f(V,(*)=1 if the system performance is acceptable and f(V,(*)=0 otherwise). The MSS reliability is defined as its expected acceptability (probability that the MSS satisfies the demand) [10]. Having the pmf of system performance (6) one can obtain its reliability as

[pic]. (7)

For example in applications where the system performance is defined as a task execution time and (*(T* is the maximum allowed task execution time, Eq. (7) takes the form

[pic], (8)

whereas in applications where the system performance is defined as its productivity/capacity and (*(C* is the minimum allowed capacity, Eq. (7) takes the form

[pic]. (9)

The MSS structure optimization problem is formulated as follows: find vector ( = {(1, …, (M}, which maximizes MSS reliability R((*) for a given demand (*:

[pic] (10)

III. Evaluating reliability of series-parallel MSS with uncovered failures

A. Universal generating function (u-function) technique

The u-function representing the pmf of a discrete random variable X is defined as a polynomial

[pic] (11)

where the variable X has H+1 possible values and (h = Pr {X = xh}.

TO OBTAIN THE U-FUNCTION REPRESENTING THE PMF OF A FUNCTION OF TWO INDEPENDENT RANDOM VARIABLES ((X, Y) THE FOLLOWING COMPOSITION OPERATOR IS USED:

[pic]. (12)

This polynomial represents all of the possible mutually exclusive combinations of realizations of the variables X abd Y by relating the probabilities of each combination to the value of function ((X, Y) for this combination.

IN THE CASE OF MSS, U-FUNCTIONS

[pic] (13)

represent the pmf of random performances of system elements (gj, pj). If for any pair of elements connected in series or in parallel their cumulative performance is defined as a function of individual performances of the elements, the pmf of the entire system performance can be obtained using the following recursive procedure [10].

1. Find any pair of system elements (i and j) connected in parallel or in series in the MSS.

2. Obtain u-function of this pair using the corresponding composition operator [pic]over two u-functions of the elements:

[pic], (14)

where the function ( is determined by the nature of interaction between elements' performances.

3. Replace the pair with single element having the u-function obtained in step 2.

4. If the MSS contains more then one element return to step 1.

B. Performance composition functions

The choice of the functions ( depends on the type of connection between the elements and on the type of the system. In our model we have to distinguish three different functions: for redundant parallel connection without work sharing (( ( (), for parallel connection with work sharing (( ( (), and for series connection (( ( ().

Consider, for example, a task processing system with performance defined as task completion time. Assume that each element j can complete the task by random time Gj (the case of total failure of the element corresponds to Gj = (). If two elements i and j perform the same task in parallel (providing task execution redundancy), the task completion time is equal to the time when the fastest element completes the task. The performance of the pair of elements in this case is determined by the function

( (Gi, Gj) = min(Gi, Gj). (15)

As it is shown in [10], if two parallel elements can share the work by dividing the task in proportion to their processing speed, the task completion time for the pair of elements is determined by the function

((Gi, Gj) = GiGj/(Gi+Gj). (16)

If two elements consecutively execute different subtasks (series connection of the elements), the entire task completion time for the pair of elements is equal to the sum of their individual execution times. The performance of the pair of elements in this case is determined as

((Gi, Gj) = Gi+Gj. (17)

Another example is data transmission system with performance defined as transmission capacity (bandwidth). Assume that each element j has random data transmission capacity Gj (the case of total failure of the element corresponds to Gj = 0). If two elements i and j transmit the same data (providing data transmission redundancy), the transmission capacity of the pair of elements is determined by the element with greater performance. The performance of the two of elements is determined by the function

( (Gi, Gj)=max(Gi, Gj). (18)

If the parallel elements share their work the entire capacity that they provide is equal to the sum of their individual capacities. The performance of the two elements is determined by the function

((Gi, Gj)= Gi+Gj. (19)

If data flow is transmitted by two consecutive elements, the bandwidth of the slowest element becomes the bottleneck of the system. Therefore the performance of the two elements is determined by the minimum of their individual performances:

((Gi, Gj) = min(Gi,Gj). (20)

C. Incorporating uncovered failures into the UGF technique

According to our model the uncovered failures can occur only within WSG. Let the element performance in the state of uncovered failure be gj0=e. This performance value usually coincides with element performance in the state of covered failure. However it should be treated in a different way since when the element j is in state 0, the entire WSG affected by the uncovered failure is also in the state of failure with performance e.

THIS CONDITION CAN BE EXPRESSED IN TERMS OF THE PERFORMANCE COMPOSITION FUNCTIONS AS FOLLOWS:

( (GI, E) = ( (E, GJ) = E (21)

for any pair of elements belonging to the subsystem affected by the uncovered failure.

LET THE PMF OF ANY SYSTEM ELEMENT J BE REPRESENTED BY U-FUNCTION

[pic] (22)

where the reduced u-function [pic] represents all of the states of element j except the state of uncovered failure (for elements without uncovered failures [pic]).

LET THE ELEMENTS BELONGING TO SET (MI={J1, …, [pic]} COMPOSE A WSG THAT FAILS IN THE CASE OF UNCOVERED FAILURE OF ANY ELEMENT FROM (MI. ACCORDING TO [12], THE PROBABILITY OF THE ENTIRE WSG FAILURE CAUSED BY UNCOVERED FAILURES OF ITS ELEMENTS IS

P0((MI)=[pic]. (23)

The pmf of the nonempty WSG performance (except the states caused by uncovered failures) can be obtained by recursively applying the operator [pic]over reduced u-functions [pic]of elements belonging to the WSG:

[pic]= [pic], (24)

if |(mi|>1[pic]= [pic][pic][pic] for k =2, …, |(mi|.

THE FULL PMF OF THE NONEMPTY WSG PERFORMANCE CAN NOW BE OBTAINED AS

[pic][pic]+[pic]ZE. (25)

The pmf of any empty WSG performance can be defined as [pic]ze.

D. ALGORITHM FOR EVALUATING MSS RELIABILITY

Based on the recursive technique presented in sections III.A and III.C one can obtain the entire MSS reliability for any given distribution of elements among WSG by applying the following algorithm:

1. Define the reduced u-functions [pic]for each system element j according to (22).

2. For any subsystem m (1(m(M):

2.1. For i = 1, …, Em:

If set (mi is not empty determine the u-function [pic]using procedures (24) and (25) otherwise define [pic]=ze.

2.2. Obtain the u-function of subsystem m [pic] using the recursive procedure:

[pic]= [pic], (26)

if Em>1 [pic]= [pic][pic][pic] for i =2, …, Em.

3. OBTAIN THE U-FUNCTION OF THE ENTIRE SYSTEM U(Z) USING THE RECURSIVE PROCEDURE:

U(z) = [pic], (27)

if M>1 U(z) = U(z) [pic][pic] for m =2, …, M.

4. FROM THE U-FUNCTION U(Z) REPRESENTING THE PMF OF THE ENTIRE MSS PERFORMANCE (6) OBTAIN THE SYSTEM RELIABILITY USING EQ. (7).

IV. Optimization technique

Equation (10) formulates a complicated NP complete set partitioning problem. An exhaustive examination of all possible solutions is not realistic, considering reasonable time limitations. As in most combinatorial optimization problems, the quality of a given solution is the only information available during the search for the optimal solution. Therefore, a heuristic search algorithm is needed, which uses only estimates of solution quality and which does not require derivative information to determine the next direction of the search.

A. Genetic Algorithm

The genetic algorithm (GA) has proven an effective optimization tool for a large number of complicated problems in reliability engineering [7, 11, 13]. Basic notions of GA are originally inspired by biological genetics. GA operates with "chromosomal" representation of solutions, where crossover, mutation and selection procedures are applied. Unlike various constructive optimization algorithms that use sophisticated methods to obtain a good singular solution, the GA deals with a set of solutions (population) and tends to manipulate each solution in the simplest manner. "Chromosomal" representation requires the solution to be coded as a finite length string.

Detailed information on GAs and its basic operators can be found in books [7, 14, 15]. The basic structure of the version of GA referred to as GENITOR [16] is as follows:

First, an initial population of Ns randomly constructed solutions (strings) is generated. Within this population, new solutions are obtained during the genetic cycle by using crossover and mutation operators. The crossover produces a new solution (offspring) from a randomly selected pair of parent solutions, facilitating the inheritance of some basic properties from the parents by the offspring (the detailed description of the crossover procedure used in this work can be found in [17]). Mutation results in slight changes to the offspring’s structure and maintains a diversity of solutions. This procedure avoids premature convergence to a local optimum and facilitates jumps in the solution space. In our GA, the mutation procedure swaps elements initially located in two randomly chosen positions on the string.

Each new solution is decoded and its objective function (fitness) values are estimated. These values, which are a measure of quality, are used to compare different solutions.

The comparison is accomplished by a selection procedure that determines which solution is better: the newly obtained solution or the worst solution in the population. The better solution joins the population, while the other is discarded. If the population contains equivalent solutions following selection, redundancies are eliminated and the population size decreases as a result.

After new solutions are produced Nrep times, new randomly constructed solutions are generated to replenish the shrunken population, and a new genetic cycle begins.

The GA is terminated after Nc genetic cycles. The final population contains the best solution achieved. It also contains different near-optimal solutions which may be of interest in the decision-making process.

To apply the genetic algorithm to a specific problem, a solution representation and decoding procedure must be defined.

B. Solution representation

In the considered problem, element separation is determined by vector ( that contains [pic]items corresponding to elements composing the entire system. In our GA solutions are represented by integer strings S={s1,…, sn}. For each i=[pic] item si of the string corresponds to item (mj of the vector ( and determines the number of WSG, the j-th element of m-th subsystem belongs to. Therefore, all the items si of the string S, corresponding to component m ([pic]), should vary in the range (1, Em). Since the random solution generation procedure can produce strings with elements randomized within the same range, to provide solution feasibility one must use a transformation procedure that makes each string element belonging to the proper range. This procedure determines the value of (mj as [pic] The range of values produced by the random generation procedure should be (1, [pic]).

C. Solution decoding procedure.

The following procedure determines the fitness value for an arbitrary solution defined by integer string S={s1, …, sn}.

1. For each subsystem m=1, …, M:

1. Determine the number of WSG for each element of the m-th component:

[pic], 1(j(Em, where [pic] (28)

2. For each WSG i (1(i(Em) create set (mi using the recursive procedure

(mi=(, (29)

for j=1,…, Em: if (mj=i (mi=(mi({c+j}.

2. Determine the system reliability by the algorithm presented in section III.D. Assign the obtained system reliability to the solution fitness.

V. Illustrative Examples

A. DATA PROCESSING SYSTEM

CONSIDER A DATA PROCESSING SYSTEM CONSISTING OF FIVE SPECIALIZED MULTIPROCESSOR UNITS PERFORMING CONSECUTIVE SUBTASKS. THE SYSTEM IS AIMED AT SOLVING DISCRETE TASKS. THE SYSTEM PERFORMANCE IS DEFINED AS THE TOTAL TASK PROCESSING TIME. THE PERFORMANCE OF ANY ELEMENT (PROCESSOR) OR SUBSYSTEM IN THE STATE OF FAILURE IS EQUAL TO INFINITY: E=( (TASK IS NEVER COMPLETED). THE DISTRIBUTIONS OF PERFORMANCES (PROCESSING TIMES) OF ELEMENTS COMPOSING EACH UNIT ARE PRESENTED IN TABLE 1. SOME PROCESSORS HAVE ONE WORKING STATE (CORRESPONDING TO THEIR NOMINAL PERFORMANCE) AND TWO FAILURE STATES: DETECTED AND UNDETECTED. SOME PROCESSORS HAVE TWO WORKING AND TWO FAILURE STATES. THE SECOND WORKING STATE CORRESPONDS TO REDUCED PROCESSOR PERFORMANCE CAUSED BY TRANSIENT FAILURES (RESTART OF DATA PROCESSING PROCESS IF AFTER A FIXED TIME THE TASK IS NOT COMPLETED).

ANY SUBSET OF PROCESSORS BELONGING TO THE SAME UNIT CAN COMPOSE A WSG IN WHICH THE TASK IS DIVIDED INTO SUBTASKS EXECUTED BY DIFFERENT PROCESSORS. UNDETECTED FAILURES WITHIN ANY WSG REMAIN UNCOVERED. DIFFERENT WSG OF THE SAME UNIT EXECUTE THE SAME TASK IN PARALLEL. THE TASK COMPLETION TIME SHOULD NOT EXCEED T*.

TABLE 1. PERFORMANCE DISTRIBUTION OF PROCESSORS

| | |WORKING STATES |FAILURE STATES |

|SUB- | | | |

|SYSTEM | | | |

| |ELEMENT | |COVERED FAILURES |UNCOVERED FAILURES |

| | |PROBABILITY |TIME |PROBABILITY |

|MIN TIME |32.627 |36.071 |37.5 |40 |

|R(40) |0.9121 |0.8881 |0.5149 |0 |

|R(50) |0.9451 |0.9806 |0.9659 |0.9298 |

|R(60) |0.9464 |0.9844 |0.9867 |0.9862 |

|R(70) |0.9464 |0.9845 |0.9869 |0.9870 |

| |SUBSYSTEM 1 |(1), (2,4), (3,5) |(1), (2,4), (3,5) |(1), (2), (4), (3), (5) |(1), (2), (4), (3), (5) |

| | | | | | |

|STRUCTURE: | | | | | |

| |SUBSYSTEM 2 |(6), (7), (8) |(6), (7), (8) |(6), (7), (8) |(6), (7), (8) |

| |SUBSYSTEM 3 |(9), (10) |(9), (10) |(9), (10) |(9), (10) |

| |SUBSYSTEM 4 |(11,12), (13,14) |(11,12), (13,14) |(11), (12), (13,14) |(11), (12), (13), (14) |

| |SUBSYSTEM 5 |(15), (16,17) |(15), (16), (17) |(15), (16), (17) |(15), (16), (17) |

[pic]

FIG. 1. FUNCTIONS R(T*) FOR OBTAINED CONFIGURATIONS OF THE DATA PROCESSING SYSTEM

B. CONTINUOUS DATA TRANSMISSION SYSTEM

CONSIDER A TWO-STAGE CONTINUOUS DATA TRANSMISSION SYSTEM CONSISTING OF TWO CONSECUTIVE MULTI-CHANNEL COMMUNICATION LINES. EACH CHANNEL IS CHARACTERIZED BY ITS NOMINAL TRANSMISSION CAPACITY AND RELIABILITY. WHEN A CHANNEL FAILS ITS CAPACITY IS EQUAL TO ZERO: E=0. THE DISTRIBUTIONS OF PERFORMANCES (TRANSMISSION CAPACITIES) OF CHANNELS COMPOSING EACH LINE ARE PRESENTED IN TABLE 3. ANY SUBSET OF CHANNELS BELONGING TO THE SAME LINE CAN COMPOSE A WSG IN WHICH THE DATA PACKAGES ARE DIVIDED INTO SUB-PACKAGES TRANSMITTED BY DIFFERENT CHANNELS. UNDETECTED FAILURES WITHIN ANY WSG REMAIN UNCOVERED. DIFFERENT WSG OF THE SAME LINE TRANSMIT THE SAME DATA IN PARALLEL. THE SYSTEM TRANSMISSION CAPACITY SHOULD BE GREATER THAN C*.

TABLE 3. PERFORMANCE DISTRIBUTION OF DATA TRANSMISSION CHANNELS

| | |WORKING STATES |FAILURE STATES |

|SUB- | | | |

|SYSTEM | | | |

| |ELEMENT | |COVERED FAILURES |UNCOVERED FAILURES |

| | |PROBABILITY |CAPACITY | | |

|MAX CAPACITY |22 |50 |44 |76 |128 |

|R(0) |(1.0 |0.9876 |0.9879 |0.9398 |0.4797 |

|R(30) |0.0 |0.9743 |0.9547 |0.9182 |0.4797 |

|R(40) |0.0 |0.8337 |0.9456 |0.8853 |0.4797 |

|R(50) |0.0 |0.0 |0.0000 |0.8619 |0.4793 |

| |SUBSYSTEM 1 |(1), (2), (3), (4), |(1, 3, 8), (2, 7), |(1,4,6), (2,3,5), |(1, 3, 4, 5, 7), |(1, 2, 3, 4, |

|STRUCTURE: | |(5), (6), (7), (8) |(4, 5, 6) |(7,8) |(2, 6, 8) |5, 6, 7, 8) |

| |SUBSYSTEM 2 |(9), (10), (11), |(9, 10, 12), |(9,13, 14), |(9, 10, 11, 12), |(9, 10, 11, 12, |

| | |(12) , (13), (14) |(11, 13, 14) |(10, 11, 12) |(13, 14) |13, 14) |

[pic]

FIG. 2. FUNCTIONS R(C*) FOR OBTAINED CONFIGURATIONS OF THE DATA TRANSMISSION SYSTEM

C. COMPUTATIONAL EFFORT AND ALGORITHM CONSISTENCY

THE GA PARAMETERS NS=100, NREP=2000 AND NC=5 WERE CHOSEN EMPIRICALLY BY TESTING THE ALGORITHM ON A SET OF RANDOMLY GENERATED PROBLEMS (THE DETERMINATION OF THE GA PARAMETERS IS DISCUSSED IN [7]). THE COMPUTATION TIME REQUIRED TO OBTAIN THE OPTIMAL SOLUTION ON A PENTIUM IV PC DID NOT EXCEED 20 SECONDS FOR ANY PROBLEM TESTED.

IN ORDER TO DEMONSTRATE THE CONSISTENCY OF THE SUGGESTED ALGORITHM, GA WAS REPEATED 10 TIMES WITH DIFFERENT STARTING SOLUTIONS (INITIAL POPULATION) FOR EACH ONE OF SEVERAL OPTIMIZATION PROBLEMS. THE COEFFICIENT OF VARIATION WAS CALCULATED FOR FITNESS VALUES OF BEST-IN-POPULATION SOLUTIONS OBTAINED DURING THE GENETIC SEARCH BY DIFFERENT GA SEARCH PROCESSES. BY THE END OF GA SEARCH THIS INDEX NEVER EXCEEDED 2%.

VI. CONCLUSION

THIS PAPER PRESENTS A MODEL OF SERIES-PARALLEL MULTI-STATE SYSTEM WITH TWO TYPES OF TASK PARALLELIZATION: PARALLEL TASK EXECUTION WITH WORK SHARING AND REDUNDANT TASK EXECUTION. THE MODEL TAKES INTO ACCOUNT THE EXISTENCE OF UNDETECTED FAILURES THAT CANNOT BE COVERED BY SUBSYSTEMS PERFORMING THEIR TASK WITH WORK SHARING.

IT WAS SHOWN THAT THE GREATEST SYSTEM RELIABILITY (DEFINED AS A PROBABILITY OF MEETING CERTAIN DEMAND) CAN BE ACHIEVED BY PROPER BALANCE BETWEEN TWO TYPES OF TASK PARALLELIZATION.

THE FAST PROCEDURE BASED ON THE UNIVERSAL GENERATING FUNCTION WAS SUGGESTED FOR THE SYSTEM RELIABILITY OPTIMIZATION. THIS PROCEDURE CAN BE USED WITH THE GENETIC ALGORITHM TO FIND THE OPTIMAL SYSTEM STRUCTURE (DISTRIBUTION OF DIFFERENT PARALLEL ELEMENTS AMONG WORK SHARING GROUPS).

THE ILLUSTRATIVE EXAMPLES SHOW THE RESULTS OBTAINED BY THE OPTIMIZATION ALGORITHM FOR TWO DIFFERENT TYPES OF SYSTEM: DATA PROCESSING SYSTEM WITH PERFORMANCE DEFINED AS TASK EXECUTION TIME AND DATA TRANSMISSION SYSTEM WITH PERFORMANCE DEFINED AS TRANSMISSION CAPACITY.

THE PRESENTED MODELS ARE LIMITED TO THE CASES WHEN THE WORK SHARING IS PERFECT. THE PERFECT SHARING CAN BE ACHIEVED IN SYSTEMS PERFORMING LARGE AMOUNT OF SIMILAR INDEPENDENT PROCEDURES (SEE, FOR EXAMPLE, BIOMAP PROJECT [18]). HOWEVER, IN MANY REAL SYSTEMS SUCH SITUATION IS IMPOSSIBLE. FOR EXAMPLE IN SOME TASK PROCESSING SYSTEMS THE MAJOR TASK CANNOT BE PARTITIONED ARBITRARILY AND SUBTASKS OFTEN CANNOT BE INDEPENDENTLY EXECUTED BECAUSE OF PRECEDENCE CONSTRAINTS (SOME SUBTASKS CANNOT BE EXECUTED UNTIL THEY HAVE RECEIVED INPUT DATA, WHICH CAN BE THE RESULT OF OTHER SUBTASKS [19]). IN THE DATA TRANSMISSION SYSTEM THE SPECIAL NETWORK ARCHITECTURE CAN PREVENT FULL PARALLELIZATION OF THE TRANSMISSION PROCESS (SEVERAL RESOURCES LOCATED IN A SAME LOCAL-AREA NETWORK CAN USE THE SAME GATEWAY TO COMMUNICATE OUTSIDE THE NETWORK [20]). IN ORDER TO INCORPORATE THE RESTRICTIONS ON THE WORK SHARING ONE HAS TO MODIFY THE FUNCTIONS (16) – (20) USING EITHER TASK PARTITION AND SCHEDULING MODELS [21] OR JOINT PERFORMANCE ESTIMATES ELICITED FROM EMPIRICAL DATA. THE FURTHER RESEARCH CAN BE DEVOTED TO DEVELOPING SUCH MODELS.

REFERENCES

[1] AMARI S., PHAM H., DILL G., OPTIMAL DESIGN OF K-OUT-OF-N:G SUBSYSTEMS SUBJECTED TO IMPERFECT FAULT-COVERAGE, IEEE TRANSACTIONS ON RELIABILITY, 53, PP. 567-575, 2004.

[2]. AMARI S., RELIABILITY, RISK AND FAULT-TOLERANCE OF COMPLEX SYSTEMS, PHD, INDIAN INSTITUTE OF TECHNOLOGY, KHARAGPUR, 1997.

[3]. AMARI S., DUGAN J., MISRA R. A SEPARABLE METHOD FOR INCORPORATING IMPERFECT FAULT-COVERAGE INTO COMBINATORIAL MODELS,” IEEE TRANS. REL., VOL. 48, PP. 267–274, 1999.

[4]. AMARI S., DUGAN J., MISRA R. OPTIMAL RELIABILITY OF SYSTEMS SUBJECT TO IMPERFECT FAULT-COVERAGE,” IEEE TRANS. REL., VOL. 48, PP. 275–284, 1999.

[5]. AMARI S., MCLAUGHLIN L., YADLAPATI B. OPTIMAL COST-EFFECTIVE DESIGN OF PARALLEL SYSTEMS SUBJECT TO IMPERFECT FAULT-COVERAGE, IN PROC. RELIABILITY & MAINTAINABILITY SYMP., PP. 29-34, 2003.

[6]. CHANG Y., AMARI S., KUO S. RELIABILITY EVALUATION OF MULTI-STATE SYSTEMS SUBJECT TO IMPERFECT COVERAGE USING OBDD, IN PROC. PACIFIC RIM INTERNATIONAL SYMP. ON DEPENDABLE COMPUTING, TSUKUBA, JAPAN, 2002.

[7]. A. LISNIANSKI, G. LEVITIN, MULTI-STATE SYSTEM RELIABILITY. ASSESSMENT, OPTIMIZATION AND APPLICATIONS, WORLD SCIENTIFIC, 2003.

[8]. TAI A., MEYER J., AVIZIENIS A., PERFORMABILITY ENHANCEMENT OF FAULT-TOLERANT SOFTWARE, IEEE TRANSACTIONS ON RELIABILITY, 42 (2), P. 227-237, 1993.

[9]. USHAKOV I., OPTIMAL STANDBY PROBLEMS AND A UNIVERSAL GENERATING FUNCTION, SOVIET JOURNAL OF COMPUTER SYSTEMS SCIENCE, 25, PP. 79-82, 1987.

[10]. LEVITIN G. UNIVERSAL GENERATING FUNCTION IN RELIABILITY ANALYSIS AND OPTIMIZATION, SPRINGER-VERLAG, LONDON, 2005.

[11]. LEVITIN G., LISNIANSKI A., BEH-HAIM H., ELMAKIS D., REDUNDANCY OPTIMIZATION FOR SERIES-PARALLEL MULTI-STATE SYSTEMS, IEEE TRANSACTIONS ON RELIABILITY, 47, PP. 165-172, 1998.

[12]. G. LEVITIN, BLOCK DIAGRAM METHOD FOR ANALYZING MULTI-STATE SYSTEMS WITH UNCOVERED FAILURES, TO APPEAR IN RELIABILITY ENGINEERING & SYSTEM SAFETY.

[13]. COIT D., SMITH A. RELIABILITY OPTIMIZATION OF SERIES-PARALLEL SYSTEMS USING GENETIC ALGORITHM, IEEE TRANS. RELIABILITY, 45, PP 254-266, 1996.

[14]. GOLDBERG D. GENETIC ALGORITHMS IN SEARCH, OPTIMIZATION AND MACHINE LEARNING. ADDISON WESLEY, READING, MA, 1989.

[15]. GEN M., CHENG R. GENETIC ALGORITHMS AND ENGINEERING DESIGN, JOHN WILEY & SONS, NEW YORK, 1997.

[16]. WHITLEY D. THE GENITOR ALGORITHM AND SELECTIVE PRESSURE: WHY RANK-BASED ALLOCATION OF REPRODUCTIVE TRIALS IS BEST. PROC. 3TH INTERNATIONAL CONF. ON GENETIC ALGORITHMS. D. SCHAFFER, ED., PP. 116-121. MORGAN KAUFMANN, 1989.

[17]. LEVITIN G., LISNIANSKI A. OPTIMAL SEPARATION OF ELEMENTS IN VULNERABLE MULTI-STATE SYSTEMS, RELIABILITY ENGINEERING & SYSTEM SAFETY, VOL. 73, PP. 55-66, 2001.

[18]. DAI Y., PALAKAL M., HARTANTO S., WANG X., GUO Y. A GRID-BASED PSEUDO-CACHE SOLUTION FOR MISD BIOMEDICAL PROBLEMS WITH HIGH CONFIDENTIALITY AND EFFICIENCY, INTERNATIONAL JOURNAL OF BIOINFORMATICS RESEARCH AND APPLICATIONS VOL. 2, NO. 3, PP. 259-281, 2006.

[19]. LEVITIN G., YUAN-SHUN DAI Y., BEN-HAIM H., RELIABILITY AND PERFORMANCE OF STAR TOPOLOGY GRID SERVICE WITH PRECEDENCE CONSTRAINTS ON SUBTASK EXECUTION, TO APPEAR IN IEEE TRANSACTIONS ON RELIABILITY.

[20]. DAI Y., LEVITIN G., RELIABILITY AND PERFORMANCE OF TREE-STRUCTURED GRID SERVICES, IEEE TRANSACTIONS ON RELIABILITY, 55(2), PP. 337-349, 2006.

[21]. JęDRZEJOWICZ P., CZARNOWSKI I., SKAKOWSKI A., SZREDER H., EVOLUTION-BASED SCHEDULING OF MULTIPLE VARIANT AND MULTIPLE PROCESSOR PROGRAMS, FUTURE GENERATION COMPUTER SYSTEMS, 17: 405–414, 2001.

-----------------------

[1] SINGULAR AND PLURAL ARE SPELLED THE SAME

................
................

In order to avoid copyright disputes, this page is only a partial summary.

To fulfill the demand for quickly locating and searching documents.

It is intelligent file search solution for home and business.

Literature Lottery

To fulfill the demand for quickly locating and searching documents.

Related download

Related searches