VLSI Testing and Verification



Module1Introduction to Testing: Testing Philosophy, Role of Testing, Digital and Analog VLSI Testing, VLSI Technology Trends Affecting Testing.Faults:Single Stuck at faults, Temporary Faults. Bridging faults, Transient faults. Fault modeling: Fault equivalence, dominance and collapsing. Fault Simulation:parallel, concurrent and deductive simulation.1.1 Testing PhilosophyIf you are a student now, or were in the past, you are quite familiar with the word test, and probably hate it. Understanding the teacher’s point of view will help. The teacher sets a domain of knowledge for testing, called the course syllabus. It may be the contents of a book, class notes, lectures, or some (arbitrary!) combination of all those. Next, comes the testing method. The teacher asks questions and analyzes the response, perhaps by matching answers to correct ones from the book. The quality of such a test system depends upon how well the test questions cover the syllabus. In VLSI testing also, one should know the specification (synonymous to the course syllabus) of the object being tested and then devise tests such that if the object produces the expected response then its conformance to the specification can be guaranteed.Returning to our student analogy, since no one has infinite time, the number of questions must be limited, and they should be cleverly devised. The teacher now makes certain assumptions. Certain typical errors, ones that the student is likely to commit, are assumed. Questions are devised especially to uncover those errors and, if the student’s answers are correct, the teacher grants the benefit of doubt, showing confidence in the implicit error model. Electronic testing also uses fault modeling and tests are generated for the assumed fault models. In testing, successful experience with a fault model gives it credibility, and eventually people expect reliability when a high percentage of the modeled faults is tested.Finally, remember that, if you fail, you must repeat the course. This is similar to redesign and remake in our “Algorithm: Perfect.” Of course, you can do better by asking your teacher, right at the beginning, about (1) the course syllabus and(2) error models (i.e., what you will be tested for), and then plan your studies to succeed. In VLSI, that is called design for testability.Example 1.1 Testing of students. In a course on xyzeeology, 70% of the students deserve to pass. We will call them “pass quality” students. Assuming that the num-ber of students in the class is large, we will study the test process using a statistical basis. For a randomly selected student from the class, we define the following events:PQ:student is pass qualityP:student passes the testFQ:student is fail qualityF:student fails thetestInour example,Assuming that only pass/fail grades are awarded,theremaining 30% students are of “failquality,” i.e.,As we998220-2952753369310-148590know, it is impossible to design a perfect test. However, our teacher does quite well and 95% of pass quality students actually pass the test. This is represented by conditional probabilities, and Similarly, the test correctly fails 95% of the fail quality students. A reader not familiar with basic concepts of the probability theory may wish to consult any basic text on the subject [677]. The diagram of Figure 1.1 illustrates the state transition caused by the test. The initial state, on the left, consists of all students in one group. The testseparates them into two groups shown on the right as “passed” and “failed.” Sizes1.1 Testing Philosophy5652145215900Figure 1.1: A pass/fail test.of the passed and failed groups created by the test are given by the total probabilities of passing and failing, respectively. The total probability of passing is,43243576200Similarly, the total probability of failing is found to be Notice that the original group had 70% pass quality students, and the test has passed only 68%. Obviously, some pass quality students have been failed. But, are all passed students of pass quality?We will examine the conditional probability Prob(FQ\P) of a student belonging to the fail quality subgroup, given that he or she has passed. The joint probability of events FQ and P is given by:459740108585and,therefore:116395585725where Prob(P) comes from Equation 1.1. Equation 1.3 is known as Bayes’ rule [677] and is commonly used for drawing inferences from statistical data. We obtain That is, 2.2% of passed students are of fail quality. We will call this the “teacher’s risk.” The teacher can reduce this risk by making the test more difficult, decreasing P(P\FQ) closer to 0. However, that test can potentially fail a few more pass quality students. So, let us examine the “student’s risk.”Applying the Bayes’ rule, we obtain57277092075This shows that 11% of failed students should have passed. We call this the “student’srisk.” Obviously, a pass quality student would not like to end up in the failed group.6Chapter 1. INTRODUCTIONTo reduce the student’s risk, the probability P(F\PQ) will have to be reduced. This can be done by making the test easier. That will, however, also pass a few more fail quality students, worsening the quality of the passing batch. Thus, teacher’s risk and student’s risk are opposing criteria, requiring practical compromises. An ideal test, that minimizes both risks, should be so “tuned” that it fails no pass quality student and passes no fail quality student. Devising such a test is no mean task for our teacher.Testing of electronic systems differs only slightly from the above scenario. A student may pass by correctly answering most, but not necessarily all, questions on the test. If a small number of answers is wrong, the teacher gives the student the benefit of doubt, for he or she may be having a bad day, or else could even learn correct answers in the future. There is no such benefit of doubt for a VLSI chip. Being inanimate, it cannot be having a bad day and certainly cannot learn. So, even a single incorrect test response will fail a VLSI chip. However, electronic tests are not perfect either. They may not cover certain faults and some bad chips will pass. We may also use some “nonfunctional tests” to prevent those bad chips from passing. Nonfunctional tests do not execute the specified function – an example is the quiescent current test discussed in Chapter 13. These tests can, in turn, fail some good chips. For VLSI, failing of good chips by tests is known as yield loss, which increases the cost of manufacturing.In electronic testing, the teacher’s risk is synonymous to the consumer’s risk. It is related to bad chips being shipped to the consumer. The student’s risk in the above example is similar to the manufacturer’s risk, since failing of good devices increases the cost. We will examine these aspects of electronic testing in Chapter 3.1.2 Role of TestingIf you design a product, fabricate and test it, and it fails the test, then there must be a cause for the failure. Either (1) the test was wrong, or (2) the fabrication process was faulty, or (3) the design was incorrect, or (4) the specification had a problem. Anything can go wrong. The role of testing is to detect whether something went wrong and the role of diagnosis is to determine exactly what went wrong, and where the process needs to be altered. Therefore, correctness and effectiveness of testing is most important for quality products (another name for perfect products.)If the test procedure is good and the product fails, then we suspect the fabrication process, the design, or the specification. If all students in a class fail then it is often considered the teacher’s failure. If only some fail, we assume that the teacher is competent, but some students are having difficulty. To select students likely to succeed, teachers may use prerequisites or admission tests for screening. Distributed testing along a product realization process catches the defect-producing causes as soon as they become active, and before they have done much damage. A well thought out test strategy is crucial to economical realization of products.The benefits of testing are quality and economy. These two attributes are not1.3 Digital and Analog VLSI Testing71185545193040Figure 1.2: VLSI realization process (a naive version.)independent and neither can be defined without the other. Quality means satisfying the user’s needs at a minimum cost. A good test process can weed out all bad prod-ucts before they reach the user. However, if too many bad items are being produced then the cost of those bad items will have to be recovered from the price charged for the few good items that are produced. It will be impossible for an engineer to de-sign a quality product without a profound understanding of the physical principles underlying the processes of manufacturing and test.1.3 Digital and Analog VLSI TestingBefore we shoot a picture we should examine the scenery. The scenery here is the process for realizing VLSI chips, shown crudely in Figure 1.2. Requirements are the user needs satisfied by the chip. They are often derived from the function of the particular application, for example controlling fuel injection in a car, controlling a robot arm, or processing pictures from a space shuttle.One sets down the specifications of various types, which include function (input-output characteristics), operating characteristics (power, frequency, noise, etc.), physical characteristics (packaging, etc.), environmental characteristics (tempera-ture, humidity, reliability, etc.), and other characteristics (volume, cost, price, avail-ability, etc.)The objective of design is to produce data necessary for the next steps of fabrica-tion and testing. Design has several stages. The first, known as architectural design, produces a system-level structure of realizable blocks to implement the functional specification. The second, called logic design, further decomposes blocks into logic gates. Finally, the gates are implemented as physical devices (e.g., transistors) and8Chapter 1. INTRODUCTIONa chip layout is produced during physical design. The physical layout is converted into photo masks that are directly used in the fabrication of silicon VLSI chips.Fabrication consists of processing silicon wafers through a series of steps involving photoresist, exposure through masks, etching, ion implantation, etc.It is naive to think that every fabricated chip will be good. Impurities and defects in materials, equipment malfunctions, and human errors are some causes of defects. The likelihood and consequences of defects are the main reasons for testing!Another very important function of testing is the process diagnosis. We must find what went wrong with each faulty chip, be it in fabrication, in design, or in testing. Or, we may have started with unrealizable specifications. The faulty chip analysis is called failure mode analysis (FMA.) FMA uses many different test types, including examination through optical and electron microscopes, to determine the failure cause and fix the process.Examine Figure 1.2 now. The arrows out of the FMA block represent the cor-rective actions applied to the faulty steps of the realization process. Consider the process as a pipeline (or assembly line) with the flow direction from top to bottom, so the effort between the point where error occurred and the point of testing, where it was detected, is wasted. At the time the error is detected, the portion of the pipeline between these two points is filled with faulty product which will be either reworked or discarded. Wasted effort and material adds to the product cost. Testing should, therefore, be placed closest to the point of error. Many companies empha-size on doing it right the first time, or pursuing the goal of zero defects. This does not mean that humans, or even machines, cannot make mistakes. These goals are achievable in an error-prone environment, when errors are detected and corrected before damage occurs.The VLSI realization process of Figure 1.3 has a distributed form of testing. The dotted lines (representing screening) show testing. Depending on the context, we give testing different names. Requirements and specifications are audited, design and tests are verified, and fabricated parts are tested. Each testing level performs two functions, and involves different technical personnel. The first function ascer-tains that the work still conforms to the objectives of previous levels and meets customer requirements. The second ascertains that things have been done accord-ing to the capabilities of the later process levels. For example, verification of design and test procedures should ensure that the design meets all functional and other specifications, and that it is also manufacturable, testable, and repairable.Figure 1.3 also shows the level of involvement of various types of engineering personnel through the lifetime of a VLSI device. While this figure is typical for an application specific integrated circuit (ASIC), it applies to many other electronic devices as well. The process begins with a dialogue between the customer and the marketing engineer. As specifications are prepared, some involvement of those re-sponsible for later activities (design, manufacture, and test) is advisable to ensure realizable specification. The systems engineer then begins by constructing an ar-chitectural block diagram. The architecture is verified by high-level simulation and each block is synthesized at the logic-level. The logic circuit is simulated for the same1.4 VLSI Technology Trends Affecting Testing9398780154940Figure 1.3: A realistic VLSI realization process.stimuli (often produced by testbenches) as used for the high-level simulation. A test-bench is hardware description language (HDL) code that, when executed, produces stimuli for the designed circuit [381] (see Sections 5.3 and 5.4.) Vectors generated by testbenches are compacted or augmented and run through a fault simulator to sat-isfy some specified fault coverage requirement. The VLSI design engineer generates a layout and verifies the timing against the specification. Manufacturing and test engineers then fabricate and test wafers, and package and test chips. All through this process, any failure modes are identified and process improvements are made to ensure a high yield of good devices. Finally, the sales and field application en-gineers interact with the customer. As “verification and test” related activities are distributed throughout the lifetime of the device, it is necessary that all engineering personnel have the knowledge of test principles.1.4 VLSI Technology Trends Affecting TestingThe complexity of VLSI technology has reached the point where we are trying to put 100 million transistors on a single chip, and we are trying to increase the on-chip clock frequency to 1 GHz. Table 1.1 shows the proposed roadmap of the Semiconductor Industries Association (SIA.) These trends have a profound effect on the cost and difficulty of chip testing.Rising Chip Clock Rates. Figure 1.4 shows microprocessor clock rate trends over the last 16 years. Microprocessors represent the leading edge in the VLSI technology trend of Table 1.1. The exponentially rising clock rate indicates several changes in testing over the next 10 years:1. At-Speed Testing. It has been established that stuck-fault tests are moreeffective when applied at the circuit’s rated clock speed [501], rather than at10Chapter 1. INTRODUCTION434975338455a lower speed. Stuck-fault testing covers all (or most) circuit signals assuming that a faulty signal may be permanently stuck-at logic 0 or 1. For a reliable high-speed test, the automatic test equipment (ATE) must operate as fast as, or faster than, the circuit-under-test (CUT.)2. ATE Cost. At the time of writing (year 2000), a state of the art ATE can apply vectors at a clock rate of 1 GHz. The cost of such a tester rises roughly at the rate of $3,000 per pin. In addition, there is a fixed cost of function generators needed for mixed-signal circuits that can range between 0.5–1.2 million dollars [65]. Thus, devices with rated speed up to 1 GHz can be tested, though at a high cost. The semiconductor industry, however, faces two types of problems. First, the installed test capability in many factories around the world still allows only about a 100 MHz clock rate. By the time the present equipment is replaced by new systems, clock rates of chips are likely to go beyond 1 GHz. Second, as Figure 1.4 shows, the microprocessor clock rate in the year 2000 has already approached 1 GHz, exceeding the present state of the art of the ATE.As the development of faster ATE continues, other test methods are also emerging. An embedded-ATE method [482], in which ATE functions such as high-speed vector generation and response analysis are added to the chip hard-ware, have been proposed. In another method, controllable delays are inserted in the chip hardware such that the critical path delay can be measured by a slow-speed tester [29]. A modified scan design (see Chapter 14) for at-speed test of Motorola’s MPC7400 microprocessor has been described recently [654].Example 1.2 Testing cost. A state of the art ATE in the year 2000 applies test vectors at clock rates up to 500 MHz. It contains analog instruments(function generators, A/D converters and waveform analyzers.) The price ofthis tester for a 1,024 pin configuration is estimated as3505201225551.4 VLSI Technology Trends Affecting Testing11874395183515Figure 1.4: Microprocessor clock rates.We compute the yearly running cost of the ATE by assuming a linear depreci-ation over five years, and an annual maintenance cost of 2% of the purchase price. The operating cost of the building, facilities, auxiliary equipment (wafer and chip handlers, fixtures, etc.), and personnel is estimated to be $0.5M.Thus:604520128270The tester is used in three eight-hour shifts per day and on all days of the year. Therefore:927735103505The test time for a digital ASIC (application specific integrated circuit) is 6 seconds. That gives the test cost as 27 cents. Since the bad chips are not sold, their test cost must be recovered from the sale of good chips. If the yield is 65%, then the test component in the sale price of a good chip is cents.The test time of a chip depends on the types of tests conducted. These may include parametric tests (leakage, contact, voltage levels, etc.) applied at a slow speed, and vector tests (also called “functional tests” in the ATE envi-ronment) applied at high speed. The time of parametric tests is proportional12Chapter 1. INTRODUCTIONto the number of pins since these tests must be applied to all active pins of the chip. The vector test time depends on the number of vectors and the clock rate. The total test time for digital chips ranges between 3 to 8 seconds. In general, mixed-signal or analog circuits have fewer pins than the digital chips. However, the tests are conducted at slower rates and test times can lie in the 3 to 6 seconds range. The handling of chips and probes, though mechanical, occurs at the speed of the machine. Such times are kept small by pipelining and parallelism used in handling. The time of handling is, however, included in the test time estimate.3. EMI. A chip operating in the GHz frequency range must be tested for elec-tromagnetic interference (EMI.) This is a problem because inductance in the wiring becomes active at these higher frequencies, whereas it could be ignored at lower frequencies. The inherent difficulties are: (1) Ringing in signal tran-sitions along the wiring, because signal transitions are reflected from the ends of a bus and bounce back to the source, where they are reflected again; (2) In-terference with signal propagation through the wiring caused by the dielectric permeability and the dielectric permittivity of the chip package; and (3) De-lay testing of paths requires propagation of sharp signal transitions, resulting in high-frequency currents through interconnects, causing radiation coupling.Delay testing is necessary, because many factors may delay a signal propa-gating along a path. We must also test the chip interconnect carefully for radiation noise induced errors.Increasing Transistor Density. Transistor feature sizes on a VLSI chip reduce roughly by 10.5% per year, resulting in a transistor density increase of roughly 22.1% every year. An almost equal amount of increase is provided by wafer and chip size increases and circuit design and process innovations [278]. This is evident inFigure 1.5, which shows a nearly 44% increase in transistors on microprocessor chips every year. This amounts to little over doubling every two years. The doubling of transistors on an integrated circuit every 18 to 24 months has been known as Moore’sLaw [475, 476, 477, 478] since the mid-1970s. Although many have predicted its end, it continues to hold, which leads to several results:1. Test complexity. Testing difficulty increases as the transistor density increases. This occurs because the internal chip modules (particularly embedded mem-ories) become increasingly difficult to access. Also, test patterns for sub-assemblies on the chip interfere with each other, due to the need to observe sub-assembly A through sub-assembly B while stimulating both sub-assemblies A and B from circuit inputs. Later chapters will show that test pattern genera-tion computation time, in the worst case, rises exponentially with the number of chip primary inputs (PIs) and with the number of on-chip flip-flops.Example1.3 Transistors versus pins. Consider a chip with a square areawhose linear dimension on the side is d. The number of transistors, that1.4 VLSI Technology Trends Affecting Testing13726440196850Figure 1.5: VLSI circuit transistor density.can be placed on the chip is proportional to the chip area, The number of input/output (I/O) pins, is proportional to 4d, since pins are placed on the periphery of the chip. We can thus express an approximate relation between and as207899066040where K is a constant. This simple relation was first observed empirically byRent at IBM and is known as Rent’s rule [680]. It has many applications and in Chapter 18, we will use a generalized form (Equation 18.3) to represent the number of terminal signals for a block of logic gates. As we shrink thefeature size, for the same chip area, bothandincrease.But the numberof transistors increases faster.Multilayer wiring allows moreof the chip areato be utilized by transistors,but does not increase the number of pins, whichmust be placed at the chip boundary (an exceptionis the flip-chip technology,not considered here,in which the pins are placed on the chip area [54].) Sinceany test proceduremust now access alarger number of devices (transistors2764790-9658353230880-965835or gates) and interconnects through a proportionately smaller number of pins, the test problem becomes more complex with the higher level of integration. Though it is not a very effective measure, the increase of test complexity is sometimes expressed as the ratio, For the data in the second column (1997-2001) of Table 1.1, this ratio for the largest chip is3716020-154305For 2003-2006 and 2009-2012, we get and respectively. This shows the test complexity more than doubling every five or six years.14Chapter 1. INTRODUCTION2. Feature scaling and power dissipation. The power density (power dissipation per unit area) of a CMOS chip is given by [185]:1533525116840where C is the combined node capacitance per unit area that is switched per clock cycle, is the supply voltage, and f is the clock frequency. In general, C is proportional to the number of transistors per unit area and the average switching probability of signals. The basic objective of shrinking the device features is to increase the circuit speed and transistor density [185]. Supposethat the feature dimensions are divided by a constantThe speedimproves by a factorbecause the individual node capacitanceis reducedasdue to shorter wires and smaller devices. Reduced features increasethe transistor density by factorresulting in an increase in C by factor3658235-6381751670685-440055518795-2940052237740-1473204758690-123190Since increased electric field within transistors can degrade reliability, to keep the electric field unchanged the supply voltage is scaled down by factor This scaling, known as the constant electric-field (CE) scaling, keeps the power density constant [185]. One can easily verify this fact by substituting in Equation 1.6. In the smaller submicron region, the CE scaling is not practical because the threshold voltage of the transistor does not scale down with dimensions. As the supply voltage gets closer to the threshold voltage, the switching speed drops, defeating at least one purpose of scaling. In practice, therefore, is scaled by factor where Although the speed up of clock frequency is restored, this increases the power density by a factor [185], causing a significant impact on testing:Verification testing must check for power buses overloaded by excessive current. This causes a brown-out in the chip, just as overloading the electric power distribution network in a city causes a drop in supply voltage. This might cause the chip power bus lines to burn out due to metal migration, just as an old-fashioned fuse burns out in a fuse box. Application of the test vectors may cause excessive power dissipation on the chip and burn it out, so the vectors must be adjusted to reduce power. Shrinking features will eventually require the design of transistors with re-duced threshold voltage. These devices have higher leakage current [652], which reduces the effectiveness of testing (discussed next.) Current testing. A very successful recent approach to test chips is to check for elevated quiescent current. This method is called testing (see Chap-ter 13.) While switching, CMOS circuits exhibit an elevated current in the digital logic, which dies out quickly to a small quiescent current after the gate output settles to a steady state. Faults, such as transistors stuck-on, shorted wires, shorts from transistor gates to drains, etc., elevate the quies-cent current. testing marks the chip as faulty if the measured quiescent current through ground busses of the chip exceeds a prespecified threshold.1.5 Scope of this Book15Integration of Analog and Digital Devices onto One Chip. We seek this toreduce costs (i.e., one part is cheaper to manufacture and assemble into an electronic system than two separate parts, one analog and the other digital.) We increase speed with this approach, by eliminating chip-to-chip delay between an A/D converter and the digital signal processor (DSP) that processes the digitized data. When data goes between two chips, the driving chip inserts a delay to amplify and buffer the output signals, and the receiving chip inserts a delay to condition the signals and propagate them through a “lightening arrester” to eliminate voltage surges coming from people handling the chip. Integration onto one chip eliminates a significant delay, but brings new issues of testing mixed-signal circuits on one chip.6070600700405c h a p t e r 10100965Introduction538480079375FAULTS IN logic CIRCUITS A failure is said to have occurred in a logic circuit or system if it deviates from its specified behavior [1]. A fault, on the other hand, refers to a physical defect in a circuit. For example, a short between two signal lines in the circuit or a break in a signal line is a physical defect. An error is usually the manifestation of a fault in the circuit; thus a fault may change the value of a signal in a circuit from 0 (correct) to 1 (erroneous) or vice versa. However, a fault does not always cause an error; in that case, the fault is considered to be latent.A fault is characterized by its nature, value, extent, and duration [2]. The nature of a fault can be classified as logical or nonlogical. A logical fault causes the logic value at a point in a circuit to become opposite to the specified value. Nonlogical faults include the rest of the faults such as the malfunction of the clock signal, power failure, etc. The value of a logical fault at a point in the circuit indicates whether the fault creates fixed or varying erroneous logical values. The extent of a fault specifies whether the effect of the fault is localized or distributed. A local fault affects only a single variable, whereas a distributed fault affects more than one. A logical fault, for example, is a local fault, whereas the malfunction of the clock is a distributed fault. The duration of a fault refers to whether the fault is permanent or temporary.1.1.1 Stuck-At FaultThe most common model used for logical faults is the single stuck-at fault. It assumes that a fault in a logic gate results in one of its inputs or the output is fixed at either a logic 0 (stuck-at-0) or at logic 1 (stuck-at-1). Stuck-at-0 and stuck-at-l faults are often abbreviated to s-a-0 and s-a-1, respectively.Let us assume that in Figure 1.1 the A input of the NAND gate is s-a-1. The NAND gate perceives the A input as a logic 1 irrespective of the logic value placed on the input. For example, the output of the NAND gate is 0 for the input pattern A=0 and B=1, when input A is s-a-1 in. In the absence of the fault, the output will be 1. Thus, AB=01 can be considered as the test for the A input s-a-l, since there is a difference between the output of the fault-free and faulty gate.The single stuck-at fault model is often referred to as the classical fault model and offers a good representation for the most common types of defects [e.g., shorts and opens in complementary ? An Introduction to Logic Circuit Testing-120651085851957705108585Figure 1.1: Two-input NAND gate.metal oxide semiconductor (CMOS) technology]. Figure 1.2 illustrates the CMOS realization of the two-input NAND:The number 1 in the figure indicates an open, whereas the numbers 2 and 3 identify the short between the output node and the ground and the short between the output node and the VDD, respectively. A short in a CMOS results if not enough metal is removed by the photolithography, whereas over-removal of metal results in an open circuit [3]. Fault 1 in Figure 1.2 will disconnect input A from the gate of transistors T1 and T3. It has been shown that in such a situation one tran-sistor may conduct and the other remain nonconducting [4]. Thus, the fault can be represented by a stuck at value of A; if A is s-a-0, T1 will be ON and T3 OFF, and if A is s-a-l, T1 will be OFF and T3 ON. Fault 2 forces the output node to be shorted to VDD, that is, the fault can be considered as an s-a-l fault. Similarly, fault 3 forces the output node to be s-a-0.The stuck-at model is also used to represent multiple faults in circuits. In a multiple stuck-at fault, it is assumed that more than one signal line in the circuit are stuck at logic 1 or logic 0; in other1507490309245Figure 1.2: Two-input NAND gate in CMOS gate.Introduction?5384800108585words, a group of stuck-at faults exist in the circuit at the same time. A variation of the multiple fault is the unidirectional fault. A multiple fault is unidirectional if all of its constituent faults are either s-a-0 or s-a-l but not both simultaneously. The stuck-at model has gained wide acceptance in the past mainly because of its relative success with small scale integration. However, it is not very effective in accounting for all faults in present day very large scale integrated (VLSI), circuits which mainly uses CMOS technology. Faults in CMOS circuits do not necessarily produce logical faults that can be described as stuck-at faults [5, 6, 7]. For example, in Figure 1.2, faults 3 and 4 create stuck-on transistors faults. As a further example, we consider Figure 1.3, which represents CMOS implementation of the Boolean function:Z = (A + B)(C + D) · EF .2154555-145415Two possible shorts numbered 1 and 2 and two possible opens numbered 3 and 4 are indi-cated in the diagram. Short number 1 can be modeled by s-a-1 of input E; open number 3 can be modeled by s-a-0 of input E, input F, or both. On the other hand, short number 2 and open number800735420370Figure 1.3: CMOS implementation of Z = (A + B)(C + D) ? EF.2407920-139065 ? An Introduction to Logic Circuit Testing-12065108585Figure 1.4: CMOS implementation of Z1 = AB and Z2 = CD .2617470-1231903321050-1231904 cannot be modeled by any stuck-at fault because they involve a modification of the network func-tion. For example, in the presence of short number 2, the network function will change to:Z = (A + C)(B + D) · EF ,2286000-1460502286000-146050and open number 4 will change the function to:Z = (AC) + (BD) · EF.2367915-1460502367915-146050For this reason, a perfect short between the output of the two gates (Figure 1.4) cannot be modeled by a stuck-at fault. Without a short, the outputs of gates Z1 and Z2 are:Zl = AB and Z2 = CD,2311400-1308102311400-1308103268345-1308103268345-130810whereas with the short,Zl = Z2 = AB + CD.2786380-1282702786380-1282703118485-1282703118485-1282701.1.2 Bridging FaultsBridging faults form an important class of permanent faults that cannot be modeled as stuck-at faults. A bridging fault is said to have occurred when two or more signal lines in a circuit are ac-Introduction?5384800108585cidentally connected together. Earlier study of bridging faults concentrated only on the shorting of signal lines in gate-level circuits. It was shown that the shorting of lines resulted in wired logic at the connection.Bridging faults at the gate level has been classified into two types: input bridging and feedback bridging. An input bridging fault corresponds to the shorting of a certain number of primary input lines. A feedback bridging fault results if there is a short between an output and input line. A feed-back bridging fault may cause a circuit to oscillate, or it may convert it into a sequential circuit.Bridging faults in a transistor-level circuit may occur between the terminals of a transistor or between two or more signal lines. Figure 1.5 shows the CMOS logic realization of the Boolean function:Zl = Z2 = AB + CD2673985-1282703006090-128270A short between two lines, as indicated by the dotted line in the diagram will change the function of the circuit.The effect of bridging among the terminals of transistors is technology-dependent. For ex-ample, in CMOS circuits, such faults manifest as either stuck-at or stuck-open faults, depending on the physical location and the value of the bridging resistance.563880546735——?Figure 1.5: CMOS implementation of Z (A, B, C, D) = AB+ CD-120651016005384800108585 Introduction to Logic Circuit Testing-120651016001.2.2 Stuck-On and Stuck-Open FaultsA stuck-on transistor fault implies the permanent closing of the path between the source and the drain of the transistor. Although the stuck-on transistor, in practice, behaves in a similar way as a stuck-closed transistor, there is a subtle difference. A stuck-on transistor has the same drain-source resistance as the on resistance of a fault-free transistor, whereas a stuck-closed transistor exhibits a drain-source resistance that is significantly lower than the normal on-resistance. In other words, in the case of stuck-closed transistor, the short between the drain and the source is almost perfect, and this is not true for a stuck-on transistor. A transistor stuck-on (stuck-closed) fault may be modeled as a bridging fault from the source to the drain of a transistor.A stuck-open transistor implies the permanent opening of the connection between the source and the drain of a transistor. The drain-source resistance of a stuck-open transistor is significantly higher than the off-resistance of a nonfaulty transistor. If the drain-source resistance of a faulty transistor is approximately equal to that of a fault-free transistor, then the transistor is considered to be stuck-off. For all practical purposes, transistor stuck-off and stuck-open faults are functionally equivalent.1537970330835Figure 1.7: A two-input CMOS NOR gate.Introduction?0108585Table 1.1:? Truth table of two-input CMOS NOR gate with and without stuck-open faultABZZ (As-op)Z (Bs-op)Z (VDDs-op)00111Zt0100Zt0100Zt00110000A stuck-open transistor fault like a feedback bridging fault can turn a combinational circuit into a sequential circuit [10]. Figure 1.7 shows a two-input CMOS NOR gate. A stuck-open fault causes the output to be connected neither to GND nor to VDD. If, for example, transistor T2 is open-circuited, then for input AB=00, the pull-up circuit will not be active and there will be no change in the output voltage. In fact, the output retains its previous logic state; however, the length of time the state is retained is determined by the leakage current at the output node.Table 1.1 shows the truth table for the two-input CMOS NOR gate. The fault-free output is shown in column Z; the three columns to the right represent the outputs in presence of the three stuck-open (s-op) faults. The first, As-op, is caused by any input, drain, or source missing connec-tion to the pull-down FET T3. The second, Bs-op, is caused by any input, drain, or source missing connection to the pull-down FET T4. The third, VDDs-op, is caused by an open anywhere in the series, p-channel pull-up connection to VDD. The symbol Zt is used to indicate that the output state retains the previous logic value.BASIC CONCEPTS OF FAULT DETECTION Fault detection in a logic circuit is carried out by applying a sequence of tests and observing the resulting outputs. A test is an input combination that specifies the expected response that a fault-free circuit should produce. If the observed response is different from the expected response, a fault is18243553702051824355370205Figure 1.8: A NAND gate with a stuck-at-1 fault.10? An Introduction to Logic Circuit Testing-12065108585Table 1.2:? Output response of the NAND gateIInputOutputabc (Fault-Free)c (Fault-Present)0011011010111100present in the circuit. The aim of testing at the gate level is to verify that each logic gate in the circuit is functioning properly and the interconnections are good. Henceforth, we will deal with stuck-at faults only unless mentioned otherwise. If only a single stuck-at fault is assumed to be present in the circuit under test, then the problem is to construct a test set that will detect the fault by utilizing only the inputs and the outputs of the circuit.As indicated above, a test detects a fault in a circuit if and only if the output produced by the circuit in the presence of the fault is different from the observed output when the fault is not pres-ent. To illustrate, let us assume that input a of the NAND gate shown in Figure 1.8 is stuck-at-1. The output responses of the gate to all input combinations for both fault-free and fault-present conditions are shown in Table 1.2.It can be seen in Table 1.2 that only for input combination ab?=?0, the output is different in the presence of the fault a s-a-1 and when the gate is fault-free.In order to detect a fault in a circuit, the fault must first be excited; that is, a certain input combination must be applied to the circuit so that the logic value appearing at the fault location is opposite to the fault value. Next, the fault must be sensitized; that is, the effect of the fault is propa-11976103175001197610317500Figure 1.9: Circuit with a single s-a-1 fault.Introduction? 115384800108585gated through the circuit to an observable output. For example, in Figure 1.9, the input combination abc=111 must be applied for the excitation of the fault, and d=1 for sensitizing the fault to output Z. Thus, the test for the s-a-1 fault is abcd=1111. This input combination is also a test for other faults (e.g., gate 1 s-a-0, gate 3 s-a-1, and input a s-a-0, etc.).1.3.3 Equivalent FaultsA test, in general, can detect more than one fault in a circuit, and many tests in a set detect the same faults. In other words, the subsets of faults detected by each test from a test set are not disjoint. Thus, a major objective in test generation is to reduce the total number of faults to be considered by grouping equivalent faults in subsets. It is then sufficient only to test one fault from each equivalent set to cover all faults in the set, thus avoiding redundancy in the test generation process.In an m-input gate, there can be 2(m+1) stuck-at faults. Thus, the total number of single stuck-at faults in a two-input NOR gate shown in Figure 1.11a is 6 (=2×3), e.g,. a s-a-0, b s-a-0, a s-a-1, b s-a-1, c s-a-0 and c s-a-1. However, a stuck-at fault on an input may be indistinguishable from a stuck-at fault at the output. For example, in a NOR gate (Figure 1.11a), any input s-a-1 fault is indistinguishable from the output s-a-0; similarly, in a NAND gate (Figure 1.11b), an input s-a-0 fault is indistinguishable from the output s-a-1.Two faults are considered to be equivalent if every test for one fault also detects the other. In the two-input NOR gate shown in Figure 1.11, a stuck-at-1 fault on one of the inputs a or b is equivalent to output c stuck-at-0, thus all three faults belong to the same equivalence set. A test for any of these three faults will also detect the presence of the other two. The equivalence sets for the NOR gate are:{a s-a-1, b s-a-1, c s-a-0}, {a s-a-0, c s-a-1},{b s-a-0, c s-a-1},1022350493395Figure 1.11: (a) NOR gate. (b) NAND gate.Introduction? 135384800108585and the equivalence sets for the NAND gate are:{a s-a-0, b s-a-0, c s-a-1}, {a s-a-1, c s-a-0},{b s-a-1, c s-a-0}.Because there are three equivalence fault sets for both NOR and NAND gates, it is sufficient to derive tests for three faults only in each case, i.e., one fault from each set. In general, an m-input gate can have a total of (m+2) logically distinct faults; however, only m+1 equivalent sets of faults need to be considered.1.3.4 Temporary FaultsAs stated earlier, an error is a manifestation of a fault. A temporary fault can result in an intermit-tent or a transient error. Transient errors are the major source of failures in VLSI chips. They are nonrecurring and are not repairable because there is no physical damage to the hardware. Very deep submicron technology has enabled the packing of millions of transistors on a VLSI chip by reduc-ing the transistor dimensions. However, the reduction of transistor sizes also reduces their noise margins. As a result, they become more vulnerable to noise, cross-talk, etc., which in turn result in transient errors. In addition, small transistors are affected by terrestrial radiation and suffer tempo-rary malfunction, thereby increasing the rate of transient errors.Intermittent faults are recurring faults that reappear on a regular basis. Such faults can occur due to loose connections, partially defective components, or poor designs. Intermittent faults oc-curring due to deteriorating or aging components may eventually become permanent. Some inter-mittent faults also occur due to environmental conditions such as temperature, humidity, vibration, etc. The likelihood of such intermittent faults depends on how well the system is protected from its physical environment through shielding, filtering, cooling, etc. An intermittent fault in a circuit causes a malfunction of the circuit only if it is active; if it is inactive, the circuit operates correctly. A circuit is said to be in a fault active state if a fault present in the circuit is active, and it is said to be in the fault-not-active state if a fault is present but inactive [11]. Because intermittent faults are random, they can be modeled only by using probabilistic methods.References[1]? Anderson, T., and P. Lee, Fault-Tolerance: Principles and Practice, Prentice-Hall International (1981).[2]? Avizienis, A., “Fault-tolerant systems,” IEEE Trans. Comput., 1304?11 (December 1976). [3]? Shoji, M., CMOS Digital Circuit Technology, Prentice-Hall (1988).14? An Introduction to Logic Circuit Testing-12065108585[4]? Maly, W., P. Nag, and P. Nigh, “Testing oriented analysis of CMOS ICs with opens,” Proc. Intl. Conf. CAD, 344?7 (1988). doi:10.1109/ICCAD.1988.122525[5]? Ferguson, I. and J. Shen, “A CMOS fault extractor for inductive fault analysis,” IEEE Trans. CAD, 1181?94 (November 1988). doi:10.1109/43.9188[6]? David, M. W., “An optimized delay testing technique for LSSD-based VLSI logic circuits,”IEEE VLSI Test Symp., 239?46 (1991).[7]? Wadsack, R. L., “Fault modelling and logic simulation of CMOS and MOS integrated cir-cuits,” Bell Syst. Technol. Jour., 1149?75 (May?June 1978).[8]? Ferguson. J., M. Taylor, and T. Lamabee, “Testing for parametric faults in static CMOS cir-cuits,” Proc. Intl. Test Conf., 436?42 (1990). doi:10.1109/TEST.1990.114052[9]? Maly, W., “Realistic fault modeling for VUI testing,” Proc. 24th ACMI IEEE Design Auzom-atlon Conf., 173?80 (1987).[10]? Ferguson, J., and J. Shen, “Extraction and simulation of realistic CMOS faults using induc-tive fault analysis,” Proc. Intl. Test Conf., 475?84 (1988). doi:10.1109/TEST.1988.207759[11]? Malaiya, Y. K., and S. Y. H. Su, “A survey of methods for intermittent fault analysis,” Proc. Nut Comput Conf., 577?84 (1979).? ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download