MER: The tool for data reuse analysis and code optimization



A Framework for Soft Error Aware Voltage Scaling in Real-Time Embedded Systems

Abstract

Dynamic voltage scaling achieves significant power saving due to the quadratic relationship between supply voltage and dynamic power. In embedded systems consisting of scalable processors and several peripherals, however, decreasing voltage in a processor may not guarantee the decrease of the system-wide energy consumption mainly due to (i) the increase of standby time (active time + idle time) of components, and (ii) the increasing static power with advance of CMOS technology. More importantly, lowering voltage increases the unfortunate effects of reducing the noise immunity dramatically, so aggressive voltage scaling can drive critical embedded applications more vulnerable to transient faults induced by external radiations. Thus, reliability-considered power management policies must be projected for critical real-time applications on battery-operated embedded systems. We propose a soft-error aware voltage scaling algorithm to find an optimal supply voltage minimizing energy consumption among those which satisfy the application-required reliability and guarantee time constraint in the context of mission-critical real-time applications on embedded systems. Our intensive experiments show that the proposed algorithm provides optimal supply voltage satisfying reliability with very small energy overhead 0.05% on average over conventional dynamic voltage scaling which may not guarantee critical reliability, and presents up to 23% energy saving over traditional redundancy techniques for high reliability.

Introduction*

In pervasive computing era, real-time applications on embedded systems do not confront only low power specifications but also high reliability requirements under hard or soft real-time constraints, since ambient systems, embedded in everyday life and closer to human, will help the reliability as an overarching concern.

Power management techniques in computing systems have been extensively investigated in order to increase power savings for embedded systems, which can be classified primarily into scaling technology [3][5][6][7] and shutdown method [1][2]: scaling technology such as DVS (Dynamic Voltage Scaling) and DFS (Dynamic Frequency Scaling) is a well-known effective way by exploiting slack time through the quadratic tradeoff between supply voltage and dynamic power whereas the linear relationship between voltage and delay. shutdown method drives a system or device into sleep state as long as possible since it consumes significantly small power in sleep state when no special event happens to this component and the energy cost for keeping it idle is more expensive than that for transiting it into sleep on. However, static power will be of a main concern with technology advancements since it is expected to take up more than half of the total power in CMOS [6][8][9]. Thus, in order to extend the battery lifetime, the power manager in embedded systems must consider specific tasks and power characteristics of each device and coordinate schedules of power states judiciously meeting the system-level requirements.

On the other hand, conventional power management techniques significantly affect the reliability of the embedded systems [10][11][12][13][14]. In the context of dynamic power management techniques, low power states like idle and sleep ones help soft error occur less than active state since soft errors, i.e., transient faults, propagate less during non-active states than during active states [13]. However, transitions between power states with high cost in terms of power and delay present more vulnerability to soft errors [13]. Voltage scaling increases SER (soft error rate) exponentially while frequency scaling reduces it with a linear relationship [15][16][19]. Further, low power designs with advance of technology will make it worse [17][18][20][21][22][23] since the cell size of memory products, for instance, continues to shrink in order to satisfy customers insatiable demands for higher density, greater functionality and lower power consumption, thus driving the supply voltage lower and reducing the capacitance inside the cell, which exponentially decreases the critical charge, which is the minimum required charge for a cell to retain data. For example, SER in SRAM with the next generation technology will be 10 to 100 times worse than that with the current technology [15]. This trend is also true or will be even worse in sequential and combinational logic circuits due to the heavy complexity and the stringent cost for fault tolerance in terms of area, performance and power consumption. Furthermore, ubiquitous computing will be asking more embedded systems operated anywhere such as in the air, under the sea, and even in the human body. And the physical computing environment and location such as altitude and latitude have the strong impacts on soft errors. For instance, the effect on SER in an airplane can be 100 to 1,000 times worse than on the ground [29].

With perspective of reliability, diverse levels of probability that failure occurs during run time can be considered according to application requirements. Soft errors, not like hard errors which are permanent, are random and rarely catastrophic, and do not normally destroy systems. For instance, multimedia applications like video or audio streaming are generous to bit upsets since an occasional bad bit or bits may be unnoticeable and unimportant to users but critical applications like control-system functions and financial transactions can not accept one single error in its data as well as in its instructions since soft errors can cause serious impacts on a loss of functions and failures. Compared to desktop systems, embedded systems such as those used in portable and wireless products, are generally more tolerant due to the lack of highly denser and larger memory, and processors operated at lower clock speed. However, they will be more likely to be used as platforms for safety-critical applications and consumer products closer to humans where reliability is of a critical concern. Thus, sagacious system-level coordination curbing both reliability and power is required to prolong system life of battery-operated embedded systems and to adapt parameters affecting reliability according to the application specifics.

Through these observations: (i) low power requirement in a system-level primarily due to highly increasing leakage power, (ii) strong relationship between low power techniques and soft errors, which will be worse with technology advancements, and (iii) embedded applications with various levels of reliability, reduction of energy consumption in embedded systems is challenge that needs to be considered, in tandem with reaching the reliability and time constraints. Thus, a novel power management policy is desired under alleviating its effects on soft errors and we propose an algorithm for soft-error aware voltage scaling, which is the first attempt, to the best of our knowledge, to introduce an optimal system-level policy for power management, especially voltage scaling within reliability constraints as well as time limits for real-time embedded applications. Thus, the main contribution of this work is to present the voltage scaling algorithm towards soft errors with guaranteeing minimum energy cost based on the required system-level of varying reliability in MTTF (Mean Time To Failure) [30] for diverse applications in embedded systems. In addition, experiments based on intuitive and practical data of soft error and power properties for each component prove the critical risks of voltage scaling methods without considering soft errors and evaluate redundancy approaches without any power management techniques, by comparing them with a proposed reliability-considered approach in terms of both power and soft error. The proposed policy can curtail energy dissipation up to 23.4 % compared to redundancy techniques on a set of generated tasks, and can increase reliability up to 36.8 % compared to aggressive voltage scaling polices in real-time embedded systems.

The rest of this paper is organized as follows: Section 2 describes the preliminaries and formulates the problem. In Section 3, we present the soft-error aware voltage scaling algorithm for real-time embedded applications meeting both reliability and time constraints. Experimental setup and results are emphasized in Section 4 and Section 5 concludes this work.

Model

2.1. SYSTEM MODEL

An embedded system consists of scalable processors and other resource components. All the resources have different characteristics with respect to power and SER.

Scalable processors like Intel XScale and PXA270 [26] support variable voltage and frequency levels. Note that supply voltage and operating frequency are tightly coupled with a linear relationship, which means Vcur/Vmax is equal to fcur/fmax where fcur and fmax are corresponding frequencies to Vcur and Vmax, respectively. We define the speed factor as a normalized processor operating speed, the ratio of the current supply voltage Vcur to the maximum voltage Vmax. Since processors support discrete voltage levels, the speed factors are discrete points (Vmin/Vmax, …, Vmax/Vmax=1) in the range (0,1]. And we assume for simplicity that the execution time of a task linearly related to the operating frequency and the correspondent supply voltage as well. For example, the execution time doubles when frequency or voltage is reduced by half.

In addition to the processor, the system includes a set of l resources denoted as R = {R1, …, Rl} that capture the peripheral devices such as memory, DSP and I/O circuits. In spite of the use of power management policies, resources are assumed to be in idle state or active state since their activities are related to the application execution [3]. Thus, active and idle states for them can be represented in terms of processor execution cycles [6]. Note that we define a set of operating states denoted as M = {mA, mI, mS, mT}, which are correspondent to active, idle, sleep and transition, respectively.

2.2. APPLICATION MODEL

An application is made up of a set of n periodic real-time tasks, denoted as A = {T, D, E, R} where T is a set of n periodic real-time tasks, D is the application deadline, E is the energy budget and R is the required reliability in FIT (Failures In Time), the failures for one billion hours.

In a set of tasks denoted as T = {t1, t2, …, tn}, a three-tuple {Ti, Di, Wi} represents each task ti, where Ti is the period of the task, Di is the deadline, and Wi is the worst case execution time (WCET) at the maximum CPU speed. We assume that the task deadline equals the given period (Ti = Di, for each task ti) and the application is said to be schedulable if all tasks satisfy their deadlines as well as the application meets D. For simplicity, we assume that all the independent tasks are scheduled on a single CPU. The utilization of a CPU is defined as Ui = Wi/Ti ≤ 1 for each task ti, which is the necessary condition for the schedulability and the voltage scaling technique.

Let CCPU,mi be the time for ti during state m ∈ M. Since Wi is the number of cycles for ti at Vmax, the slowdown execution time at Vcur is defined as CCPU,mAi = Wi/si subject to Di, which is the execution time of ti. (Di-CCPU,mAi) is a slack, and the idle time as well as the sleep time can be dependent on power management policy. Thus, total execution time of an application can be characterized by ET(A)=Σi∈nCCPU,mAi ≤ D (1). For simplicity, this work defines as a constant an idle delay, the minimum delay before entering into sleep from active state, and forces a CPU to sleep when energy cost for transition and sleep is less than that for sleep state.

Let CRji = δj*CmAi be the sum of cycles resource Rj in active state and idle state during the execution of ti where δj is the dependency of standby state for Rj on CPU execution time, CRj,mi is the number of cycles in each state (m=mA or mI) for Rj and ti, thus CRj,mAi + CRj,mIi = δj*CmAi. Similarly, CRj,mSi is defined as sleep time for Rj, which is dependent on δj*CmAi and power management policy.

2.3. POWER MODEL

In embedded systems, the power is consumed by CPU, memory and other circuits. Where the dynamic power dissipation dominates in CMOS power model which is related quadratically with supply voltage and linearly with operating frequency, the static power can not be ignored, in particular at advanced technology nodes [9]. The power consumption P can be captured as:

P = Pstatic + Pdynamic (2)

Pdynamic = CeffV2f (3)

where Pstatic is the leakage current power which is consumed in idle state as well as in active state, Pdynamic is the dynamic power consumed during active state, Ceff is the effective switch capacitance, V is the supply voltage, and f is the operating frequency.

The power consumption of CPU at a speed factor s during a power state m is represented as P(CPU, s, m). Similarly, the power consumption of each component Rj involved in a task ti is denoted by P(Rj, m). CCPU,mi*P(CPU, si, m) and CRj,mi*P(Rj, m) are energy consumptions of CPU and Rj, respectively, for a state m at speed si. Thus, the system-wide energy dissipation for task ti with a speed factor si is represented as:

ECi(si) = Σ m∈M CCPU,mi*P(CPU,si,m) + ΣRj∈Rti,m∈M CRj,mi*P(Rj,m) (4)

and the total energy consumption of a system for an application A consisting of n tasks is:

EC(A) = Σi∈nECi(si) (5)

where EC(A) is subject to E (6).

Thus, the existing energy-efficient voltage scaling algorithms minimize (5) in terms of system-wide energy consumption for a given set of tasks whereas voltage scaling methods only considering dynamic power of scalable processors focus on minimizing (3).

2.4. SOFT-ERROR MODEL

Soft errors, i.e., transient faults or single-event upsets, caused by external radiations in semiconductor devices have been introduced and well investigated since the late 1970s [15][16][22]. When energetic particles such as alpha particles, cosmic neutrons and thermal neutrons in 10B dielectric layers hit a sensitive region in circuits, a high density of electron-hole pairs form in the wake, and the drift and diffusion mechanisms cause a logic error or flip the bit value eventually [21]. Semiconductor devices show various SER values and different trends with scaling technology. For example, bit SER in DRAM technologies has been decreasing whereas those in SRAM and core logics are becoming concerned since they have presented significant increase of SER with technology generation where fault tolerance and recovery techniques not only are complicated but also expensive in terms of area and performance overhead [20][21][22][23].

For a soft error to occur at a specific node in a circuit, the collected charge Qcoll at that particular node should be more than Qcrit. This concept of critical charge generally used to estimate the sensitivity of SER modeled as follow [17][11]:

SER ∝ Nflux * CS * exp{-(Qcrit/Qs)} (7)

where Nflux is the intensity of the Neutron Flux, CS is the area of the cross section of the node and Qs is the charge collection efficiency. And Qcrit is proportional to the node capacitance C and the supply voltage V as in:

Qcrit ∝ C*V (8)

Therefore, the smaller nodal capacitances and the lower supply voltage result in the higher SER exponentially according to expression (8) and (7). Further, insatiable requests of customers drive the manufacturer to integrate highly denser chips operating at lower voltage, which will significantly drop the natural resistance to soft errors with advance of technology.

The SER of CPU is defined as FR(CPU, si, m) where the speed factor dominated by supply voltage and the operating state affect SER. We assume that other factors are the environmental and technology constants beyond our interests. In active state, the SER of CPU is an exponential function of supply voltage [12][16], which can be expressed as follow:

FR(CPU,si,mA) = FR(CPU,1,mA)*10e(1-si) (9)

where FR(CPU, 1, mA) is the SER of CPU during active state at the maximum speed, i.e., maximum supply voltage, and e is the exponent constant according to CPU technology. This model indicates that decrease of the supply voltage results in increasing fault rates exponentially and the larger e means error rate more sensitive to voltage scaling [12]. Since SER at low power state is less or even less than that at active state, let FR(CPU,si,mI) and FR(CPU,si,mS) be FR(CPU,1,mI) = 0.1FR(CPU,1,mA) and FR(CPU,1,mS) = 0.001FR(CPU,1,mA) for simplicity in this work, respectively. But transit state is comparatively vulnerable [13] to soft error, which is assigned to the same as FR(CPU,1,mA).

Similarly, FR(Rj,m) represents SER of resource Rj at state m and captures the component property of soft error where active state’s soft error occurs more often than low power states while that of transition between states has higher susceptibility of soft errors. Thus, FR(Rj,m) for each power state has the same weights as those in FR(CPU,si,m).

The failure with a system-view results conservatively from any single soft error occurring at any component during the period. In other words, the failure number is the function of the time and SER, so FNi(si) is defined as the number of possible soft errors occurring at any component in a system during the period Di for task ti at speed si, which is represented by:

FNi(si) = Σ m∈M Cmi*FR(CPU,si,m) + ΣRj∈Rti,m∈M CRj,mi*FR(Rj,m) (10)

and the total number of possible failures in a system for an application consisting of n tasks is:

FN(A) = Σi∈nFNi(si) (11)

where FN(A)/D, the anticipated system SER, must be subject to R (12) in order to satisfy the application-specific reliability. Since FR is SER in FIT, FN(A) is the possible failures during an application deadline in a time unit and FN(A)/D is the computed system SER on average. R presents the application-specified reliability in FIT and we define three classes of reliability such as low with 114,155 FIT, mid with 38,054 and high reliability with 11,415, which are corresponding to 1 year-, 3 year- and 10 year-MTTF, respectively.

We propose a framework to explore the soft-error aware voltage scaling satisfying (12) as well as subject to (5) and (1) to meet system requirements in terms of reliability, energy and time budgets, respectively.

Soft Error Aware Voltage Scaling

In this section, the existing power management techniques with voltage scaling or without it are considered and evaluated with the proposed soft-error aware voltage scaling with respect to execution time, energy consumption, and reliability.

3.1. POWER MANAGEMENT WITHOUT VOLTAGE SCALING (Initial)

Power management scheme without voltage scaling schedules a CPU and resources into sleep states when there is no task left and slack time is enough to save energy while all the tasks are running at the maximum voltage. So it is the best solution in terms of performance but it requires the highest energy consumption for CPU.

3.2. OPTIMAL VOLTAGE SCALING IN TERMS OF CPU ENERGY (CPU-Energy Optimal)

If we just consider the energy consumption of CPU, especially dynamic energy dissipation, it may increase the running time of other resources like DRAM, so it negatively affects the system-wide energy consumption. Further, this CPU oriented voltage scaling presents significant possibility of transient faults due to the exponential relationship between supply voltage and SER. Especially, tasks with long deadline can break energy efficiency and reliability vulnerably. Further, the longer execution time of CPU causes the longer running time of other dependent resources, which increases the rate of transient faults over the system.

3.3. OPTIMAL VOLTAGE SCALING IN TERMS OF SYSTEM ENERGY (System-Energy Optimal)

Considering the static energy of CPU as well as energy dissipation of system resources, we can find an optimal supply voltage of CPU in terms of the system-wide energy consumption. In [6], they optimize CPU speed is by starting with CPU optimal and by increasing the speed of one task among multiple tasks, which shows the minimum overhead with respect to energy and delay. However, it does not deliberate reliability, in particular, the effects of supply voltage on SER.

3.4. SOFT-ERROR AWARE VOLTAGE SCALING (SER Optimal)

System-Energy Optimal may satisfy the requirement of reliability in soft errors but it is not guaranteed. Thus, we need an algorithm applicable to find an optimal solution at minimal energy cost satisfying the reliability constraints as well as time limitation for real-time embedded applications.

Soft-error aware voltage scaling algorithm approaches with three steps presented in Figure 1 as follow: (i) it begins with CPU-energy optimal setup for all the tasks constituting an application, which is assumed to be schedulable within D. (ii) energy-efficient voltage is selected by increasing the supply voltage of CPU for a task with perspective of minimizing system-wide energy dissipation less than E as in [6]. (iii) soft-error aware voltage is chosen by checking the compliance of FN(A)/D with R and increasing a supply voltage one of tasks, which minimizes ∆EC but maximizes ∆FN where ∆EC and ∆FN are the energy difference and failure difference between the current voltage and the next setup for a task, respectively.

|Algorithm 1: Compute Soft Error Aware Supply Voltage |

|1: Begin with A meeting ET(A) < D |

|2: Find Vi for each task ti minimizing EC(A) of a system; |

|3: WHILE ( All Vi ≤ Vmax ) DO |

|4: IF ( FN(A) satisfies R ) THEN |

|5: RETURN A; |

|6: END IF |

|7: FOR each task ti DO |

|8: IF ( Vi < Vmax ) THEN |

|9: Compute ∆ECm and ∆FNm; |

|10: END IF |

|11: Find tm with minimal ( ∆ECm/∆FNm ); |

|12: IF ( multiple tm ) THEN |

|13: Select tm such that ∆ECm is minimul; |

|14: END IF |

|15: END FOR |

|16: Increase Vm for tm; |

|17: END WHILE |

|18: RETURN FAILURE; |

Figure 1 Soft Error Aware Voltage Scaling Algorithm

Note that we just begin with a schedule satisfying time deadline, soft-error aware step increases the voltage, which fulfills the time constraint presenting the maximum effect on system reliability with minimum energy overhead. Of course, it increases the system-wide energy consumption since the second step solution is the most energy-efficient, thus it should be compared with the energy budget. Through these three steps, we can find the optimal supply voltage of a CPU to consume the minimum system-level energy among those which satisfies the required reliability within time constraint. However, when all the tasks have the maximum supply voltage but it dissatisfies the reliability, the reliability approaches such as redundancy techniques should be applied and we will redo this algorithm as described in Figure 2 by adding each scheme to one of system modules until it guarantees the reliability requirements with minimum energy cost within time deadline.

|Algorithm 2: Select Soft Error Aware RA with Min EC |

|1: WHILE ( FN(A)/D dissatisfies R ) |

|2: Apply Redundancy Approach (RA) |

|3: such that RA minimizes ∆ECRA/∆FNRA; |

|4: Run Algorithm 1 with RA; |

|5: END WHILE |

Figure 2 Soft Error and Power Aware Reliability Approach

Experiments

4.1. EXPERIMENTAL SETUP

In order to evaluate our algorithm, we assume a simple embedded system consisting of a scalable CPU [26] and DRAM [27] and SRAM [28], which are most common modules of a huge concern about soft errors in embedded systems.

[pic]

Figure 3 System Model

For high reliability, DRAM and SRAM have data redundancy functions such as ECC (Error Correction Coding), and CPU can execute applications with spatial and temporal redundancy like checkpointing [30]. For CPU redundancy, the power and SER are assumed to double with 10% time overhead for temporal checkpointing, which will also increase the system energy dissipation. Overheads with respect to power and performance for DRAM ECC and SRAM ECC are assumed to be 1/8 rate due to requiring 8 bits for 64 bit word [21], 3% delay and the same rate with 20% delay [24]. The DVS processor, Intel PXA 270, with 8 supply voltage levels from 0.85 Volt to 1.55 Volt and dissipates power ranging from 44.5 mWatts to 925 mWatts. Note that all the SERs are assumed to be reasonable according to various references even though definitely these figures are not representing soft errors for real products since they can not be reached. For instance, the soft error of CPU during active state at the maximum speed is 50,000 in FIT based on the statement in [25]. We select 2.7 as an exponent to characterize the sensitivity of exponential effects of voltage scaling on SER in CPU [12][16]. But an exponential impact on soft error is just applied to 10% of 50,000 FITs since Mitra et al in [23] present that SER of combinational logic contributes approximately to 10% of overall SER at state-of-the-art technology. And SER for 1 Mbits in recent SRAM (DRAM) is assumed to be 1,000 (100) FIT [15][20][21], thus it computes soft errors to 128,000 FIT/SRAM (102,400 FIT/DRAM) when it consists of 128 Mbits (1,024 Mbits). Specific data is summarized at Table 1. Note that SER will be increasing with several magnitudes with technology advancements.

Table 1 System Specification

[pic]

For simulation, an application consists of up to randomly generated 5 periodic tasks in the following way: period of each task is selected from 10 msec to 120 msec, WCET is chosen according to generated utilization value at random from 0.05 to 0.5 by multiplying utilization value and period. All the tasks are assumed to execute and complete up to their WCET for simplicity. Low system utilization and generous time deadline help our attention focus on the relationship of energy consumption and failures since task scheduling is beyond our concern in this work. Therefore, the deadline D is given as the sum of each period to a task and we simulate them with three levels of reliability such as low-, medium-, high-reliability.

The typical standby time including active and idle states for the resources as a percentage of the task execution time is assumed to be in the range [20%,60%] and [10%,90%] correspondent to DRAM and SRAM, respectively [6]. And the active time takes up the uniform 20% of standby time and the residual duration is for idle. Assume that all the transitions in every resource have the time overhead with 1 msec, the same power dissipation and the same SER as those of active state.

4.2. RESULTS

Initial experiments prove the importance of considering the SER at design time with our framework. Figure 4(a) displays the exponential effects of voltage scaling on the system-wide SER and the dramatically increasing gap between SERs from maximum to minimum voltage with advance of technology is depicted in Figure 4(b) primarily due to exponential increase of SER at SRAM every generation and significant concern of that in core logics in spite of saturated SER in DRAM [21]. Further, Figure 4(a) indicates that the voltage scaling without considering soft error may dissatisfy the reliability requirement of an application. In this specific example, all the voltages satisfy low reliability but do not high reliability. And some of them are satisfactory with mid reliability but others are not. However, our proposed voltage algorithm in Figure 1 provides a vehicle to evaluate the configurations and find the optimal one to match the reliability among the gap between maximum and minimum soft errors even considering energy and time limitations. Thus, these observations strongly require designers to very carefully consider SER with help of this framework in order to manage reliability as well as low power and high performance.

[pic]

Figure 4 Initial Exploration of Voltage Scaling and Advanced Technology Effects on Soft Error Rate.

The second experiment emphasizes the danger of aggressive voltage scaling. Figure 4(a) and 5(a) confirms that blind DVS makes SER worse mainly because of the exponential increase of it and the longer execution time, which linearly increases the possibility of soft error occurrence in the resources. The simulated experiments in Figure 5(b) to 5(d) compare execution time for an application, system-wide energy consumption and reliability during execution for each algorithm such as Initial, CPU-energy optimal, System-energy optimal, and Soft-error aware optimal. Note that “optimal” means the minimal energy consumption satisfying the specified consideration. For instance, CPU-energy optimal presents the supply voltage with the minimal energy consumption of CPU while Soft-error aware optimal computes the supply voltage with minimal energy consumption among those satisfying the required reliability in FIT. They show that Soft-error aware scaling algorithm satisfies the required reliability (Mid) with a minimal energy cost (around 12% overhead as compared to System-energy optimal) but we can save unnecessary energy waste in applying redundancy techniques without an application requesting high reliability. On the other hand, CPU-energy optimal and System-energy optimal present higher failure rates like 2.7 times and 1.4 times worse than soft-error aware voltage configuration. Thus, our proposed framework estimates all the configurations in the gap between minimum and maximum voltage scaling, and provides the best solution satisfying reliability within time as well as energy constraints in the context of real-time embedded applications. Note that Figure 5(a) shows the system-wide energy consumption fluctuates due to high dependence of system energy on the static power after a point, which suggests the system-wide energy optimal solution.

[pic]

Figure 5 Profiling Execution Time, System-level Energy Consumption, and Average System SER for each algorithm: Initial, CPU-energy Optimal, System-energy Optimal, and SER Optimal

When an application demands a mid reliability, we can knob the supply voltage to find the optimal solution if it is located between min FIT and max FIT correspondent to min and max voltages. Even though the proposed soft-error aware voltage scaling algorithm in Figure 1 pursues the optimal voltage within boundary, algorithm 2 in Figure 2 suggests the adaptability of algorithm 1 according to the application-demanded reliability. For instance, any combination of voltage scaling to each task can not reach high-reliability, thus reliability approaches must be added such as temporal redundancy and data redundancy even though they are expensive in terms of power and performance as well as area cost. Figure 6 displays system-level SERs on average according to various combinations of redundancy approaches composed with voltage scaling polices to achieve application-demanding reliability. Simulated experiments in those without redundancy (Normal, left-most one) and with a specific fault-tolerance technique to each module (ECC at DRAM, ECC at SRAM and dual CPU with checkpointing, next 3 bars) as well as with ECCs on both DRAM and SRAM (5th bar from the left-most) dissatisfy the requirement for Mid-reliability even though they are all satisfactory with Low-reliability. Experiments prove that SER optimal one with redundancy on CPU as well as on DRAM or on SRAM (6th or 7th bar from the left-most) becomes sagacious combination in terms of both reliability and power consumption since it presents up to 4.74 times more reliable with a relatively small energy overhead than System-energy optimal one which ignores reliability, and it prolongs the system life by reducing unnecessary power dissipation due to an extra redundancy which arises in all redundancy implemented approach (the right-most bar in Figure 6).

Experimental results prove that the proposed framework for the soft-error aware voltage scaling is energy efficient and reliability

[pic]

Figure 6 Average System SER with or without Reliability Approaches composed in Voltage Scaling Algorithms

-aware solution out of the bigger gap from blind voltage scaling and technology advancement with minimal cost overhead in terms of energy and time. However, for high reliability applications, energy and soft-error aware approach can satisfy the critical application judiciously unless a proposed soft-error aware algorithm without any affordable reliability approach meets the constraints.

Conclusion

This paper opens a new venue of research in composing reliability and power management techniques to utilize the right blend of SER and voltage scaling for real-time embedded applications. The well-known scaling techniques may fail for mission-critical applications with perspective of reliability since voltage scaling, for example, raises SER dramatically. On the other hand, reliability approaches like redundancy methods can produce unnecessary energy overhead for non error-resilient applications like multimedia applications and legacy ones unless it cares about energy cost wisely. Therefore, both power and reliability should be considered judiciously in order to satisfy emerging applications in a pervasive real-time embedded system with advance of technology which will require much more concern on soft errors. Based on these observations, first we propose the soft-error aware voltage scaling algorithm for periodic tasks in real-time embedded systems, which can limit aggressive voltage decrease in order to meet the required reliability. Further, the combined soft-error and power aware approaches are presented in cases of reliability-critical embedded applications, which may suggest the vehicles to select the best optimal methods in terms of system-wide energy consumption and reliability. Therefore, this work is tailored to handle not only the reliability limitation, but it is also designed to prolong the battery life for real-time embedded applications.

This work will be extended into approaches considering other properties and network status in a cross-layered way by coordinating attributes of power and reliability at each layer, which will help increase the alternatives and compositions to find the best solution satisfying multiple parameters.

References

1] T. Simunic, L. Benini, P. Glynn, G. De Micheli, “Event-Driven Power Management,” IEEE Trans. on CAD, pp.840-857, July 2001.

2] S. Irani, S. Shukla, and R. Gupta, “Online Strategies for Dynamic Power Management in Systems with Multiple Power-Saving States,” Trans. on Embedded Computing Systems, 2(3):325-346, 2003

3] V. Delaluz, M. Kandemir, N. Vijaykrishnan, A. Sivasubramanian, and M. Irwin, “Hardware and Software Techniques for Controlling DRAM Power Modes,” IEEE Trans. on Computers, 50(11):1154-1173, 2001.

4] Y. Shin, K. Choi, and T. Sakurai, “Power Optimization of Real-Time Embedded Systems on Variable Speed Processors,” Proc. of ICCAD, pp.365-368, Nov. 2000.

5] H. Aydin, R. Melhem, D. Mossé, and P.M. Alvarez, “Determining Optimal Processor Speeds for Periodic Real-Time Tasks with Different Power Characteristics,” Proc. of EuroMicro Conference on Real-Time Systems, Jun. 2001.

6] R. Jejurikar and R. Gupta, “Dynamic Voltage Scaling for Systemwide Energy Minimization in Real-Time Embedded Systems,” Proc. of ISLPED, pp.78-81, 2004.

7] J. Zhuo and C. Chakrabarti, “System-Level Energy-Efficient Dynamic Task Scheduling,”, Proc. of DAC, 2005.

8] S. Thompson, P. Packan, and M. Bohr, “MOS Scaling: Transistor Challenges for the 21st Century,” ITJ, Q3, 1998.

9] N. S. Kim et al, “Leakage Current: Moore’s Law Meet Static Power,” IEEE Computer, 36(12), pp. 68-75, Dec. 2003.

10] T. Simunic, K. Mihic, and G. De Micheli, “Optimization of Reliability and Power Consumption in Systems on a Chip,” Proc. of PATMOS, 2005.

11] L. Li, V. Degalahal, N. Vijaykrishnan, M. Kandemir, and M. J. Irwin, “Soft Error and Energy Consumption Interactions: A Data Cache Perspective,” Proc. of ISLPED, pp. 132-137, 2004

12] D. Zhu, R. Melhem, and D. Mossé, “The Effects of Energy Management on Reliability in Real-Time Embedded Systems,” Proc. of ICCAD, Nov. 2004.

13] K. Mihic, T. Simunic, and G. De Micheli, “Reliability and Power Management of Integrated Systems,” Proc. of EuroMicro Systems on Digital System Design, 2004.

14] D. Zhu, R. Melhem, D. Mossé, and E. Elnozahy, “Analysis of an Energy Efficient Optimistic TMR Scheme,” Proc. of ICPDS, Jul. 2004.

15] W. Leung, F. C. Hsu, and M. E. Jones, “The Ideal SoC Memory: 1T-SRAM,” Proc. of IEEE SoC/ASIC Conference, pp. 32-36, Sep. 2000.

16] N. Seifert, D. Moyer, N. Leland, and R. Hokinson, “Historical Trend in Alpha-Particle Induced Soft Error Rates for the Alpha Microprocessor,” IEEE annual IRPS, pp. 259-265, 2001.

17] P. Hazucha and C. Svensson, “Impact of CMOS Technology Scaling on the Atmospheric Neutron Soft Error Rate,” IEEE Trans. On Nuclear Science, 47(6):2586-2594, 2000.

18] P. Shivakumar, M. Kistler, S. W. Keckler, D. Burger, and L. Alvisi, “Modeling the Effect of Technology Trends on the Soft Error Rate of Combinational Logic,” Proc. of DSN, 2002.

19] K. J. Hass, J. W. Gambles, B. Walker, and M. Zampaglione, “Mitigating Single Event Upsets from Combinational Logic,” Proc. of the NASA Symposium on VLSI Design, 1998.

20] P. E. Dodd, M. R. Shaneyfelt, J. R. Schwank, and G. L. Hash, “Neutron-Induced Latchup in SRAMs at Ground Level,” IRPS, 2003.

21] R. Bauman, “Soft Errors in Advanced Computer Systems,” IEEE Design and Test of Computers, pp. 258-266, 2005.

22] R. Bauman, “The Impact of Technology Scaling on Soft Error Rate Performance and Limits to the Efficacy of Error Correction,” IEDM, 2002.

23] S. Mitra, N. Seifert, M. Zhang, Q. Shi, and K. S. Kim, “Robust System Design with Built-In Soft-Error Resilience,” IEEE Computer, pp. 43-51.

24] “Soft Errors in Electronic Memory,” Tezzaron White Paper available at .

25] R. Phelan, “Addressing Soft Errors in ARM Core-based Designs,” ARM while paper available at , 2003.

26] Intel PXA 270 Processor at .

27] Rambus RDRAM at .

28] Hitachi SRAM at .

29] R. Mastipuram and E. C. Wee, “Soft Errors’ Impact on System Reliability,” available at , Sep. 2004.

30] D. K. Pradhan, “Fault-Tolerant Computer System Design,” Prentice Hall, ISBN:0-13-057887-8, 1995.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download