Multiple Bit Parity Based Concurrent Error Detection ...



APPENDIX OVERVIEW OF VLSI Very-large-scale integration?(VLSI) is the process of creating an?integrated circuit?(IC) by combining hundreds of thousands of?transistors?or devices into a single chip. VLSI began in the 1970s when complex?semiconductor?and?communication?technologies were being developed. The?microprocessor?is a VLSI device. Before the introduction of VLSI technology most ICs had a limited set of functions they could perform. An?electronic circuit?might consist of a?CPU,?ROM,?RAM?and other?glue logic. VLSI lets IC designers add all of these?into one chip. The?history of the transistor?dates to the 1920s when several inventors attempted devices that were intended to control current in solid-state diodes and convert them into triodes. Success came after World War II, when the use of silicon and germanium crystals as radar detectors led to improvements in fabrication and theory. Scientists who had worked on radar returned to solid-state device development. With the invention of transistors at Bell Labs in 1947, the field of electronics shifted from vacuum tubes to solid-state device.With the small transistor at their hands, electrical engineers of the 1950s saw the possibilities of constructing far more advanced circuits. However, as the complexity of circuits grew, problems arose. One problem was the size of the circuit. A complex circuit like a computer was dependent on speed. If the components were large, the wires interconnecting them must be long. The electric signals took time to go through the circuit, thus slowing the computer. The?invention of the integrated circuit?by?Jack Kilby?and?Robert Noyce?solved this problem by making all the components and the chip out of the same block (monolith) of semiconductor material. The circuits could be made smaller, and the manufacturing process could be automated. This led to the idea of integrating all components on a single-crystal silicon wafer, which led to small-scale integration (SSI) in the early 1960s, medium-scale integration (MSI) in the late 1960s, and then large-scale integration (LSI) as well as VLSI in the 1970s and 1980s, with tens of thousands of transistors on a single chip (later hundreds of thousands, then millions, and now billions (109)). The first semiconductor chips held two transistors each. Subsequent advances added more transistors, and as a consequence, more individual functions or systems were integrated over time. The first integrated circuits held only a few devices, perhaps as many as ten?diodes,?transistors,?resistors?and?capacitors, making it possible to fabricate one or more?logic gates?on a single device. Now known retrospectively as?small-scale integration?(SSI), improvements in technique led to devices with hundreds of logic gates, known as?medium-scale integration?(MSI). Further improvements led to?large-scale integration?(LSI), i.e. systems with at least a thousand logic gates. Current technology has moved far past this mark and today's?microprocessors?have many millions of gates and billions of individual transistors. At one time, there was an effort to name and calibrate various levels of large-scale integration above VLSI. Terms like?ultra-large-scale integration?(ULSI) were used. But the huge number of gates and transistors available on common devices has rendered such fine distinctions moot. Terms suggesting greater than VLSI levels of integration are no longer in widespread use. In 2008, billion-transistor processors became commercially available. This became more commonplace as semiconductor fabrication advanced from the then-current generation of?65?nm?processes. Current designs, unlike the earliest devices, use extensive?design automation?and automated?logic synthesis?to?lay out?the transistors, enabling higher levels of complexity in the resulting logic functionality. Certain high-performance logic blocks like the SRAM (static random-access memory) cell, are still designed by hand to ensure the highest efficiency.System Requirement The first step of any design process is to lay down the specifications of the system. System specification is a high level representation of the system. The factors to be considered in this process include: performance, functionality, and physical dimensions (size of the die (chip)). The fabrication technology and design techniques are also considered. The specification of a system is a compromise between market requirements, technology and economical viability. The end results are specifications for the size, speed, power, and functionality of the VLSI system.Fig VLSI Design FlowArchitectural Design: The basic architecture of the system is designed in this step. This includes, such decisions as RISC (Reduced Instruction Set Computer) versus CISC (Complex Instruction Set Computer), number of ALUs, Floating Point units, number and structure of pipelines, and size of caches among others. The outcome of architectural design is a Micro-Architectural Specification (MAS). While MAS is a textual (English like) description, architects can accurately predict the performance, power and die size of the design based on such a description.Behavioral or Functional Design: In this step, main functional units of the system are identified. This also identifies the interconnect requirements between the units. The area, power, and other parameters of each unit are estimated. The behavioral aspects of the system are considered without implementation specific information. For example, it may specify that a multiplication is required, but exactly in which mode such multiplication may be executed is not specified. We may use a variety of multiplication hardware depending on the speed and word size requirements. The key idea is to specify behavior, in terms of input, output and timing of each unit, without specifying its internal structure. The outcome of functional design is usually a timing diagram or other relationships between units. This information leads to improvement of the overall design process and reduction of the complexity of subsequent phases. Functional or behavioral design provides quick emulation of the system and allows fast debugging of the full system. Behavioral design is largely a manual step with little or no automation help available.Logic Design: In this step the control flow, word widths, register allocation, arithmetic operations, and logic operations of the design that represent the functional design are derived and tested. This description is called Register Transfer Level (RTL) description. RTL is expressed in a Hardware Description Language (HDL), such as VHDL or Verilog. This description can be used in simulation and verification. This description consists of Boolean expressions and timing information. The Boolean expressions are minimized to achieve the smallest logic design which conforms to the functional design. This logic design of the system is simulated and tested to verify its correctness. In some special cases, logic design can be automated using high level synthesis tools. These tools produce a RTL description from a behavioral description of the design.Circuit Design: The purpose of circuit design is to develop a circuit representation based on the logic design. The Boolean expressions are converted into a circuit representation by taking into consideration the speed and power requirements of the original design. Circuit Simulation is used to verify the correctness and timing of each component. The circuit design is usually expressed in a detailed circuit diagram. This diagram shows the circuit elements (cells, macros, gates, transistors) and interconnection between these elements. This representation is also called a net list. Tools used to manually enter such description are called schematic capture tools. In many cases, a net list can be created automatically from logic (RTL) description by using logic synthesis tools.Physical Design: In this step the circuit representation (or net list) is converted into a geometric representation. As stated earlier, this geometric representation of a circuit is called a layout. Layout is created by converting each logic component (cells, macros, gates, transistors) into a geometric representation (specific shapes in multiple layers), which perform the intended logic function of the corresponding component. Connections between different components are also expressed as geometric patterns typically lines in multiple layers. The exact details of the layout also depend on design rules, which are guidelines based on the limitations of the fabrication process and the electrical properties of the fabrication materials. Physical design is a very complex process and therefore it is usually broken down into various sub-steps. Various verification and validation checks are performed on the layout during physical design. In many cases, physical design can be completely or partially automated and layout can be generated directly from net list by Layout Synthesis tools. Layout synthesis tools, while fast, do have an area and performance penalty, which limit their use to some designs. Manual layout, while slow and manually intensive, does have better area and performance as compared to synthesized layout. However this advantage may dissipate as larger and larger designs may undermine human capability to comprehend and obtain globally optimized solutions.Fabrication: After layout and verification, the design is ready for fabrication. Since layout data is typically sent to fabrication on a tape, the event of release of data is called Tape Out. Layout data is converted (or fractured) into photo-lithographic masks, one for each layer. Masks identify spaces on the wafer, where certain materials need to be deposited, diffused or even removed. Silicon crystals are grown and sliced to produce wafers. Extremely small dimensions of VLSI devices require that the wafers be polished to near perfection. The fabrication process consists of several steps involving deposition, and diffusion of various materials on the wafer. During each step one mask is used. Several dozen masks may be used to complete the fabrication process. A large wafer is 20 cm (8 inch) in diameter and can be used to produce hundreds of chips, depending of the size of the chip. Before the chip is mass produced, a prototype is made and tested. Industry is rapidly moving towards a 30 cm (12 inch) wafer allowing even more chips per wafer leading to lower cost per chip.Packaging, Testing and Debugging: Finally, the wafer is fabricated and diced into individual chips in a fabrication facility. Each chip is then packaged and tested to ensure that it meets all the design specifications and that it functions properly. Chips used in Printed Circuit Boards (PCBs) are packaged in Dual In-line Package (DIP), Pin Grid Array (PGA), Ball Grid Array (BGA), and Quad Flat Package (QFP). Chips used in Multi-Chip Modules (MCM) are not packaged, since MCMs use bare or naked chips.FRONT-END DESIGN FLOW VLSI Industry, VLSI is mainly divided into two major domains, first, one is called front end and second is VLSI Back End, the series is designed according to the practical approaches, labs and works.VLSI front end considers all the logical designing and verification part, In simple words it says all the work up to the gate level or RTL level designing and verification considered as VLSI front end designing and verification, there are multiple ways for logical designing of OC in VLSI Front End .Fig Front end design flowY Chart The Gajski-Kuhn Y-chart is a model, which captures the considerations in designing semiconductor devices. The three domains of the Gajski-Kuhn Y-chart are on radial axes. Each of the domains can be divided into levels of abstraction, using concentric rings. At the top level (outer ring), we consider the architecture of the chip; at the lower levels (inner rings), we successively refine the design into finer detailed implementation Creating a structural description from a behavioral one is achieved through the processes of high-level synthesis or logical synthesis. Creating a physical description from a structural one is achieved through layout synthesis.Fig Y chart The design hierarchy involves the principle of "Divide and Conquer." It is nothing but dividing the task into smaller tasks until it reaches to its simplest level. This process is most suitable because the last evolution of design has become so simple that its manufacturing becomes easier.We can design the given task into the design flow process's domain (Behavioral, Structural, and Geometrical). To understand this, let’s take an example of designing a 16-bit adder, as shown in the figure below.Fig 16-bit Adder Here, the whole chip of 16 bit adder is divided into four modules of 4-bit adders. Further, dividing the 4-bit adder into 1-bit adder or half adder. 1 bit addition is the simplest designing process and its internal circuit is also easy to fabricate on the chip. Now, connecting all the last four adders, we can design a 4-bit adder and moving on, we can design a 16-bit adder.BACK-END DESIGN FLOW Second is VLSI Back End, that considers all the designing and verification part after logical designing means gate level or RTL level designing shown in Fig , That may include Floor planning, power Planning, Clock tree synthesis, place and route. Timing optimization, DRC and LVS and all the foundry work like fabrication, packaging etc., are also comes under VLSI Back End.Fig BACK-END DESIGN FLOWCHAPTER 1INTRODUCTION 1.1 NETWORK LAYERS OSI stands for?Open Systems Interconnection. It has been developed by ISO – ‘International Organization of Standardization‘, in the year 1974. It is a 7 layer architecture with each layer having specific functionality to perform. All these 7 layers work collaboratively to transmit the data from one person to another across the globe.Fig 1.1 Layers in OSI Model1.1.1 Physical Layer (Layer 1): The lowest layer of the OSI reference model is the physical layer. It is responsible for the actual physical connection between the devices. The physical layer contains information in the form of?bits.?It is responsible for the actual physical connection between the devices. When receiving data, this layer will get the signal received and convert it into 0s and 1s and send them to the Data Link layer, which will put the frame back together.The functions of the physical layer are:Bit synchronization:?The physical layer provides the synchronization of the bits by providing a clock. This clock controls both sender and receiver thus providing synchronization at bit level.Bit rate control:?The Physical layer also defines the transmission rate i.e. the number of bits sent per second.Physical topologies:?Physical layer specifies the way in which the different, devices/nodes are arranged in a network i.e. bus, star or mesh topolgy.Transmission mode:?Physical layer also defines the way in which the data flows between the two connected devices. The various transmission modes possible are: Simplex, half-duplex and full-duplex. Hub, Repeater, Modem, Cables are Physical Layer devices. Network Layer, Data Link Layer and Physical Layer are also known as?Lower Layers?or?Hardware Layers.1.1.2 Data Link Layer (DLL) (Layer 2): The data link layer is responsible for the node to node delivery of the message. The main function of this layer is to make sure data transfer is error free from one node to another, over the physical layer. When a packet arrives in a network, it is the responsibility of DLL to transmit it to the Host using its MAC address.Data Link Layer is divided into two sub layers:Logical Link Control (LLC)Media Access Control (MAC)The packet received from Network layer is further divided into frames depending on the frame size of NIC (Network Interface Card). DLL also encapsulates Sender and Receiver’s MAC address in the header. The Receiver’s MAC address is obtained by placing an ARP(Address Resolution Protocol) request onto the wire asking “Who has that IP address?” and the destination host will reply with its MAC address.The functions of the data Link layer are:Framing:?Framing is a function of the data link layer. It provides a way for a sender to transmit a set of bits that are meaningful to the receiver. This can be accomplished by attaching special bit patterns to the beginning and end of the frame.Physical addressing:?After creating frames, Data link layer adds physical addresses (MAC address) of sender and/or receiver in the header of each frame.Error control:?Data link layer provides the mechanism of error control in which it detects and retransmits damaged or lost frames.Flow Control:?The data rate must be constant on both sides else the data may get corrupted thus , flow control coordinates that amount of data that can be sent before receiving acknowledgement.Access control:?When a single communication channel is shared by multiple devices, MAC sub-layer of data link layer helps to determine which device has control over the channel at a given time.1.1.3 Network Layer (Layer 3): Network layer works for the transmission of data from one host to the other located in different networks. It also takes care of packet routing i.e. selection of the shortest path to transmit the packet, from the number of routes available. The sender & receiver’s IP address are placed in the header by network layer.The functions of the Network layer are:Routing:?The network layer protocols determine which route is suitable from source to destination. This function of network layer is known as routing.Logical Addressing:?In order to identify each device on internetwork uniquely, network layer defines an addressing scheme. The sender & receiver’s IP address are placed in the header by network layer. Such an address distinguishes each device uniquely and universally. Segment?in Network layer is referred as?Packet. Network layer is implemented by networking devices such as routers.1.1.4 Transport Layer (Layer 4): Transport layer provides services to application layer and takes services from network layer. The data in the transport layer is referred to as?Segments. It is responsible for the End to End delivery of the complete message. Transport layer also provides the acknowledgment of the successful data transmission and re-transmits the data if an error ? At sender’s side:? Transport layer receives the formatted data from the upper layers, performs?Segmentation and also implements?Flow & Error control?to ensure proper data transmission. It also adds Source and Destination port number in its header and forwards Note:?The sender need to know the port number associated with the receiver’s application.Generally, this destination port number is configured, either by default or manually. For example, when a web application makes a request to a web server, it typically uses port number 80, because this is the default port assigned to web applications. Many applications have default port assigned.?At receiver’s side: Transport Layer reads the port number from its header and forwards the Data which it has received to the respective application. It also performs sequencing and reassembling of the segmented data.The functions of the transport layer are :Segmentation and Reassembly:?This layer accepts the message from the (session) layer, breaks the message into smaller units . Each of the segment produced has a header associated with it. The transport layer at the destination station reassembles the message.Service Point Addressing:?In order to deliver the message to correct process, transport layer header includes a type of address called service point address or port address. Thus by specifying this address, transport layer makes sure that the message is delivered to the correct process.The services provided by transport layer :Connection Oriented Service:?It is a three-phase process which include–Connection Establishment–Data Transfer– Termination / disconnectionIn this type of transmission, the receiving device sends an acknowledgment, back to the source after a packet or group of packet is received. This type of transmission is reliable and secure.Connection less service:?It is a one phase process and includes Data Transfer. In this type of transmission, the receiver does not acknowledge receipt of a packet. Data in the Transport Layer is called as? Segments. Transport layer is operated by the Operating System. It is a part of the OS and communicates with the Application Layer by making system calls. Transport Layer is called as?Heart of OSI?model.1.1.5 Session Layer (Layer 5):This layer is responsible for establishment of connection, maintenance of sessions, authentication and also ensures security. The functions of the session layer are :Session establishment, maintenance and termination:?The layer allows the two processes to establish, use and terminate a connection.Synchronization:?This layer allows a process to add checkpoints which are considered as synchronization points into the data. These synchronization point help to identify the error so that the data is re-synchronized properly, and ends of the messages are not cut prematurely and data loss is avoided.Dialog Controller:?The session layer allows two systems to start communication with each other in half-duplex or full-duplex. All the below 3 layers (including Session Layer) are integrated as a single layer in TCP/IP model as “Application Layer”. Implementation of these 3 layers is done by the network application itself. These are also known as?Upper Layers?or?Software Layers.SCENARIO: Let’s consider a scenario where a user wants to send a message through some Messenger application running in his browser. The “Messenger” here acts as the application layer which provides the user with an interface to create the data. This message or so-called Data is compressed, encrypted (if any secure data).Fig 1.1.5 General Block Diagram of Communication1.1.6 Presentation Layer (Layer 6): Presentation layer is also called the?Translation layer. The data from the application layer is extracted here and manipulated as per the required format to transmit over the network.The functions of the presentation layer are :Translation:?For example, ASCII to EBCDIC.Encryption/ Decryption:?Data encryption translates the data into another form or code. The encrypted data is known as the cipher text and the decrypted data is known as plain text. A key value is used for encrypting as well as decrypting pression:?Reduces the number of bits that need to be transmitted on the network.1.1.7 Application Layer (Layer 7): At the very top of the OSI Reference Model stack of layers, we find Application layer which is implemented by the network applications. These applications produce the data, which has to be transferred over the network. This layer also serves as a window for the application services to access the network and for displaying the received information to the user. Network Virtual TerminalFTAM-File transfer access and managementMail ServicesDirectory ServicesOSI model acts as a reference model and is not implemented in Internet because of its late invention. Current model being used is the TCP/IP model There are many reasons such as noise, cross-talk etc., which may help data to get corrupted during transmission. The upper layers work on some generalized view of network architecture and are not aware of actual hardware data processing. Hence, the upper layers expect error-free transmission between the systems. Most of the applications would not function expectedly if they receive erroneous data. Applications such as voice and video may not be that affected and with some errors they may still function well. Data-link layer uses some error control mechanism to ensure that frames (data bit streams) are transmitted with certain level of accuracy. But to understand how errors is controlled, it is essential to know what types of errors may occur.1.2 TYPES OF ERRORSThere may be three types of errors:Single bit errorIn a frame, there is only one bit, anywhere though, which is corrupt.Multiple bits errorFrame is received with more than one bits in corrupted state.Burst errorFrame contains more than1 consecutive bits corrupted.Error control mechanism may involve two possible ways:Error detectionError correction1.2.1 Error Detection:Errors in the received frames are detected by means of Parity Check and Cyclic Redundancy Check (CRC). In both cases, few extra bits are sent along with actual data to confirm that bits received at other end are same as they were sent. If the counter-check at receiver’ end fails, the bits are considered corrupted.1.2.2 Parity Check:One extra bit is sent along with the original bits to make number of 1s either even in case of even parity, or odd in case of odd parity.The sender while creating a frame counts the number of 1s in it. For example, if even parity is used and number of 1s is even then one bit with value 0 is added. This way number of 1s remains even. If the number of 1s is odd, to make it even a bit with value 1 is added.The receiver simply counts the number of 1s in a frame. If the count of 1s is even and even parity is used, the frame is considered to be not-corrupted and is accepted. If the count of 1s is odd and odd parity is used, the frame is still not corrupted.If a single bit flips in transit, the receiver can detect it by counting the number of 1s. But when more than one bits are error nous, then it is very hard for the receiver to detect the error.1.2.3 Cyclic Redundancy Check (CRC)CRC is a different approach to detect if the received frame contains valid data. This technique involves binary division of the data bits being sent. The divisor is generated using polynomials. The sender performs a division operation on the bits being sent and calculates the remainder. Before sending the actual bits, the sender adds the remainder at the end of the actual bits. Actual data bits plus the remainder is called a codeword. The sender transmits data bits as code words.At the other end, the receiver performs division operation on codewords using the same CRC divisor. If the remainder contains all zeros the data bits are accepted, otherwise it is considered as there some data corruption occurred in transit.Error CorrectionIn the digital world, error correction can be done in two ways:Backward Error Correction: When the receiver detects an error in the data received, it requests back the sender to retransmit the data unit.Forward Error Correction: When the receiver detects some error in the data received, it executes error-correcting code, which helps it to auto-recover and to correct some kinds of errors.The first one, Backward Error Correction, is simple and can only be efficiently used where retransmitting is not expensive. For example, fiber optics. But in case of wireless transmission retransmitting may cost too much. In the latter case, Forward Error Correction is used.To correct the error in data frame, the receiver must know exactly which bit in the frame is corrupted. To locate the bit in error, redundant bits are used as parity bits for error detection. For example, we take ASCII words (7 bits data), then there could be 8 kind of information we need: first seven bits to tell us which bit is error and one more bit to tell that there is no error.For m data bits, r redundant bits are used. r bits can provide 2r combinations of information. In m+r bit codeword, there is possibility that the r bits themselves may get corrupted. So the number of r bits used must inform about m+r bit locations plus no-error information, i.e. m+r+1.1.2.4 Generator PolynomialsWhy is the predetermined c+1-bit divisor that's used to calculate a CRC called a generator polynomial? In my opinion, far too many explanations of CRCs actually try to answer that question. This leads their authors and readers down a long path that involves tons of detail about polynomial arithmetic and the mathematical basis for the usefulness of CRCs. This academic stuff is not important for understanding CRCs sufficiently to implement and/or use them and serves only to create potential confusion. So I'm not going to answer that question here.Suffice it to say here only that the divisor is sometimes called a generator polynomial and that you should never make up the divisor's value on your own. Several mathematically well-understood generator polynomials have been adopted as parts of various international communications standards; you should always use one of those. If you have a background in polynomial arithmetic then you know that certain generator polynomials are better than others for producing strong checksums. The ones that have been adopted internationally are among the best of these.Table 1 lists some of the most commonly used generator polynomials for 16- and 32-bit CRCs. Remember that the width of the divisor is always one bit wider than the remainder. So, for example, you'd use a 17-bit generator polynomial whenever a 16-bit checksum is required.?Checksum WidthGenerator PolynomialCRC-CCITT16 bits10001000000100001CRC-1616 bits11000000000000101CRC-3232 bits100000100110000010001110110110111Table 1 International standard CRC polynomialsAs is the case with other types of checksums, the width of the CRC plays an important role in the error detection capabilities of the algorithm. Ignoring special types of errors that are always detected by a particular checksum algorithm, the percentage of detectable errors is limited strictly by the width of a checksum. A checksum of c bits can only take one of 2c?unique values. Since the number of possible messages is significantly larger than that, the potential exists for two or more messages to have an identical checksum. If one of those messages is somehow transformed into one of the others during transmission, the checksum will appear correct and the receiver will unknowingly accept a bad message. The chance of this happening is directly related to the width of the checksum. Specifically, the chance of such an error is 1/2c. Therefore, the probability of any random error being detected is 1-1/2c.To repeat, the probability of detecting any random error increases as the width of the checksum increases. Specifically, a 16-bit checksum will detect 99.9985% of all errors. This is far better than the 99.6094% detection rate of an eight-bit checksum, but not nearly as good as the 99.9999% detection rate of a 32-bit checksum. All of this applies to both CRCs and addition-based checksums. What really sets CRCs apart, however, is the number of special cases that can be detected 100% of the time. For example, I pointed out last month that two opposite bit inversions (one bit becoming 0, the other becoming 1) in the same column of an addition would cause the error to be undetected. Well, that's not the case with a CRC.By using one of the mathematically well-understood generator polynomials like those in Table 1 to calculate a checksum, it's possible to state that the following types of errors will be detected without fail:A message with any one bit in errorA message with any two bits in error (no matter how far apart, which column, and so on)A message with any odd number of bits in error (no matter where they are)A message with an error burst as wide as the checksum itselfThe first class of detectable error is also detected by an addition-based checksum, or even a simple parity bit. However, the middle two classes of errors represent much stronger detection capabilities than those other types of checksum. The fourth class of detectable error sounds at first to be similar to a class of errors detected by addition-based checksums, but in the case of CRCs, any odd number of bit errors will be detected. So the set of error bursts too wide to detect is now limited to those with an even number of bit errors. All other types of errors fall into the relatively high 1-1/2c?probability of detection.1.2.5 Ethernet, SLIP, and PPPEthernet, like most physical layer protocols, employs a CRC rather than an additive checksum. Specifically, it employs the CRC-32 algorithm. The likelihood of an error in a packet sent over Ethernet being undetected is, therefore, extremely low. Many types of common transmission errors are detected 100% of the time, with the less likely ones detected 99.9999% of the time. Even if an error would somehow manage to get through at the Ethernet layer, it would probably be detected at the IP layer checksum (if the error is in the IP header) or in the TCP or UDP layer checksum above that. After all the chances of two or more different checksum algorithms not detecting the same error is extremely remote.However, many embedded systems that use TCP/IP will not employ Ethernet. Instead, they will use either the serial line Internet protocol (SLIP) or point-to-point protocol (PPP) to send and receive IP packets directly over a serial connection of some sort. Unfortunately, SLIP does not add a checksum or a CRC to the data from the layers above. So unless a pair of modems with error correction capabilities sits in between the two communicating systems, any transmission errors must hope to be detected by the relatively weak, addition-based Internet checksum described last month. The newer, compressed SLIP (CSLIP) shares this weakness with its predecessor. PPP, on the other hand, does include a 16-bit CRC in each of its frames, which can carry the same maximum size IP packet as an Ethernet frame. So while PPP doesn't offer the same amount of error detection capability as Ethernet, by using PPP you'll at least avoid the much larger number of undetected errors that may occur with SLIP or CSLIP. CHAPTER 2CRC (CYCLIC REDUNDANCY CHECK)2.1 INTRODUCTION The recent advancements in wireless technology have brought forth very high speed transmission which faces a major challenge from noisy channels. Factors such as signal attenuation, multiple access interference, inter symbol interference and Doppler shift degrade the quality of data to a great extent. As a result, the received signal has a chance of getting corrupted. Consequently, for the alleviation of this problem, error control coding is not only desirable, but has become a must to achieve an acceptable Bit Error Rate (BER). For the purpose of checking the integrity of the data being stored or sent over noisy channels, the most widely adopted among all the error control codes is Cyclic Redundancy Check Code (CRC). CRCs, first introduced by Peterson and Brown in 1961 are used for the detection of any corruption of the digital signal during its production, transmission, processing as well as storage stage. Recent wireless technologies namely, Asynchronous Transfer Mode (ATM) IEEE communication standards, such as wired Ethernet (IEEE 802.3) Wi-Fi (IEEE 802.11) and WiMAX (IEEE 802.16), employ CRC for error detection purpose. A very well written understanding of the CRC computation and collection of most of the published hardware and software implementations of CRC can be found. Various parallel hardware structures have been described. Several literature have proposed highly efficient architecture based on performance enhancement as well as resource utilization. The list also includes Two-Step, Cascade, Look-Ahead, State-Space Transformed, and Retimed Architectures. The authors of have also proposed a three-step LFSR architecture which makes it easy to address the two major issues of large fanout limitation and iteration bound limitation in parallel LFSR architecture which is an integral part of CRC computation. The authors have also proposed an improved solution for the fanout problem. The authors have shown that the proposed architecture achieves significant improvement in terms of processing speed and hardware efficiency. the authors have extended the above idea and presented a mathematical proof showing that a transformation exists in state space that can reduce the complexity of the parallel LFSR feedback loop along with a new idea for high speed parallel implementation of linear feedback shift registers based upon parallel IIR filter. In some other methods the authors have proposed a novel parallel implementation of long BCH encoders to achieve significant speedup by eliminating the effect of large fan out. The authors have mentioned that similar technique can also be used for CRC. Rapid advancement of semiconductor technologies results in rapidly increasing soft error rates due to shrink in area, time and power. Temporary faults caused by cosmic radiation, known as single event upset (SEU), are the main sources of errors. A lot of attention has been paid for the reliability improvement of all the elementary components reflected by their much lower failure rates. But the large number and dense population of these elements on the chip degrade the reliability to a large extent. While error detection codes and error correction codes have long been used for checking errors, the reliability of the error correction or detection encoder itself has come up as a major issue. High speed data transmission requires high speed error detection and parallel processing at the cost of increased hardware, which increases the probability of error occurrence of internal faults. As a result, fault detection of encoder and decoder have gained a lot of importance. The authors have addressed the problem of designing parallel fault-secure encoders by generating not only error correcting check bits, but also independently and in parallel, error detecting check bits. The next step involves the comparison of error detection check bits with another set of error detecting check bits generated from error correction check bits. This design is significantly cost effective compared to duplication with comparison technique. There are several other literature which deal with the similar topic. In some papers the authors have also discussed the design of a fault secure encoder in between a fault secure RAM and a faulty channel. Error detection and error correction using Residue Number System have been discussed. We find the description of fault detection of Reed Solomon encoder. The authors have presented a new approach for the automated synthesis of multilevel circuits with concurrent error detection using a parity-check code. They have developed a new procedure namely structure-constrained logic optimization and proved that this design along with a checker forms a self-checking circuit Our proposed method of the multiple bit parity based concurrent fault detection scheme for parallel CRC computation inherently exploits the non-overlapping feature of the parity generation circuits. We have presented a design concept of multiple-bit parity-based concurrent fault detection for parallel CRC architecture. In the proposed scheme, the generated CRC output is divided into a few blocks based on the number of parity bits and has been made to go through a multiple-bit parity comparison unit. Then, the actual parity bits are compared with the predicted multiple-bit parity of the blocks to incorporate the fault detection architecture. In the cyclic redundancy check, a fixed number of check bits, often called a checksum, are appended to the message that needs to be transmitted. The data receivers receive the data and inspect the check bits for any errors. Mathematically, data receivers check on the check value attached by finding the remainder of the polynomial division of the contents transmitted. If it seems that an error has occurred, a negative acknowledgement is transmitted asking for data retransmission. A cyclic redundancy check is also applied to storage devices like hard disks. In this case, check bits are allocated to each block in the hard disk. When a corrupt or incomplete file is read by the computer, the cyclic redundancy error is reported. This could be from another storage device or from CD/DVDs. The common reasons for errors include system crashes, incomplete or corrupt files, or files with lots of bugs.CRC polynomial designs depend on length of block to be protected, error protection features, resource for CRC implementation, and performance.A?cyclic redundancy check?(CRC) is an?error-detecting code?commonly used in digital?networks?and storage devices to detect accidental changes to raw data. Blocks of data entering these systems get a short?check value?attached, based on the remainder of a?polynomial division?of their contents. On retrieval, the calculation is repeated and, in the event the check values do not match, corrective action can be taken against data corruption. CRCs can be used for?error correction. CRCs are so called because the?check?(data verification) value is a?redundancy?(it expands the message without adding?information) and the?algorithm?is based on?cyclic?codes. CRCs are popular because they are simple to implement in binary?hardware, easy to analyze mathematically, and particularly good at detecting common errors caused by?noise?in transmission channels. Because the check value has a fixed length, the?function?that generates it is occasionally used as a?hash function. The CRC was invented by?W. Wesley Peterson?in 1961; the 32-bit CRC function, used in Ethernet and many other standards, is the work of several researchers and was published in 1975. CRCs are based on the theory of?cyclic?error-correcting codes. The use of?systematic?cyclic codes, which encode messages by adding a fixed-length check value, for the purpose of error detection in communication networks, was first proposed by?W. Wesley Peterson?in 1961. Cyclic codes are not only simple to implement but have the benefit of being particularly well suited for the detection of?burst errors: contiguous sequences of erroneous data symbols in messages. This is important because burst errors are common transmission errors in many?communication channels, including magnetic and optical storage devices. Typically an?n-bit CRC applied to a data block of arbitrary length will detect any single error burst not longer than?n?bits and the fraction of all longer error bursts that it will detect is?(1 ? 2?n). Specification of a CRC code requires definition of a so-called?generator polynomial. This polynomial becomes the?divisor?in a?polynomial long division, which takes the message as the?dividend?and in which the?quotient?is discarded and the?remainder?becomes the result. The important caveat is that the polynomial?coefficients?are calculated according to the arithmetic of a?finite field, so the addition operation can always be performed bitwise-parallel (there is no carry between digits).In practice, all commonly used CRCs employ the?Galois field?of two elements,?GF(2). The two elements are usually called 0 and 1, comfortably matching computer architecture.A CRC is called an?n-bit CRC when its check value is?n?bits long. For a given?n, multiple CRCs are possible, each with a different polynomial. Such a polynomial has highest degree?n, which means it has?n?+ 1?terms. In other words, the polynomial has a length of?n?+ 1; its encoding requires?n?+ 1?bits. Note that most polynomial specifications either drop the?MSB or?LSB, since they are always 1. The CRC and associated polynomial typically have a name of the form CRC-n-XXX as in the?table?below.The simplest error-detection system, the?parity bit, is in fact a 1-bit CRC: it uses the generator polynomial?x?+ 1?(two terms), and has the name CRC-1.2.2 DATA INTEGRITY CRCs are specifically designed to protect against common types of errors on communication channels, where they can provide quick and reasonable assurance of the?integrity?of messages delivered. However, they are not suitable for protecting against intentional alteration of data.Firstly, as there is no authentication, an attacker can edit a message and recomputed the CRC without the substitution being detected. When stored alongside the data, CRCs and cryptographic hash functions by themselves do not protect against?intentional?modification of data. Any application that requires protection against such attacks must use cryptographic authentication mechanisms, such as?message authentication codes?or?digital signatures?(which are commonly based on?cryptographic hash?functions). Secondly, unlike cryptographic hash functions, CRC is an easily reversible function, which makes it unsuitable for use in digital signatures.[4] Thirdly, CRC is a?linear function?with a property that {\displaystyle \operatorname {crc} (x\oplus y\oplus z)=\operatorname {crc} (x)\oplus \operatorname {crc} (y)\oplus \operatorname {crc} (z);}as a result, even if the CRC is encrypted with a?stream cipher?that uses?XOR?as its combining operation (or?mode?of?block cipher?which effectively turns it into a stream cipher, such as OFB or CFB), both the message and the associated CRC can be manipulated without knowledge of the encryption key; this was one of the well-known design flaws of the?Wired Equivalent Privacy?(WEP) protocol.2.3 DESIGNING POLYNOMIALS The selection of the generator polynomial is the most important part of implementing the CRC algorithm. The polynomial must be chosen to maximize the error-detecting capabilities while minimizing overall collision probabilities.The most important attribute of the polynomial is its length (largest degree(exponent) +1 of any one term in the polynomial), because of its direct influence on the length of the computed check value.The most commonly used polynomial lengths are:9 bits (CRC-8)17 bits (CRC-16)33 bits (CRC-32)65 bits (CRC-64) A CRC is called an?n-bit CRC when its check value is?n-bits. For a given?n, multiple CRCs are possible, each with a different polynomial. Such a polynomial has highest degree?n, and hence?n?+ 1?terms (the polynomial has a length of?n?+ 1). The remainder has length?n. The CRC has a name of the form CRC-n-XXX.The design of the CRC polynomial depends on the maximum total length of the block to be protected (data + CRC bits), the desired error protection features, and the type of resources for implementing the CRC, as well as the desired performance. A common misconception is that the "best" CRC polynomials are derived from either?irreducible polynomials?or irreducible polynomials times the factor?1 +?x, which adds to the code the ability to detect all errors affecting an odd number of bits.[7]?In reality, all the factors described above should enter into the selection of the polynomial and may lead to a reducible polynomial. However, choosing a reducible polynomial will result in a certain proportion of missed errors, due to the quotient ring having?zero divisors. The advantage of choosing a?primitive polynomial?as the generator for a CRC code is that the resulting code has maximal total block length in the sense that all 1-bit errors within that block length have different remainders (also called?syndromes) and therefore, since the remainder is a linear function of the block, the code can detect all 2-bit errors within that block length. If?{\displaystyle r}?is the degree of the primitive generator polynomial, then the maximal total block length is?{\displaystyle 2^{r}-1}, and the associated code is able to detect any single-bit or double-bit errors. We can improve this situation. If we use the generator polynomial?{\displaystyle g(x)=p(x)(1+x)}, where?{\displaystyle p(x)}is a primitive polynomial of degree?{\displaystyle r-1}, then the maximal total block length is?{\displaystyle 2^{r-1}-1}, and the code is able to detect single, double, triple and any odd number of errors. A polynomial?{\displaystyle g(x)}that admits other factorizations may be chosen then so as to balance the maximal total block length with a desired error detection power. The?BCH codes?are a powerful class of such polynomials. They subsume the two examples above. Regardless of the reducibility properties of a generator polynomial of degree?r, if it includes the "+1" term, the code will be able to detect error patterns that are confined to a window of?r?contiguous bits. These patterns are called "error bursts".2.4 SPECIFICATIONThe concept of the CRC as an error-detecting code gets complicated when an implementer or standards committee uses it to design a practical system. Here are some of the complications:Sometimes an implementation?prefixes a fixed bit pattern?to the bitstream to be checked. This is useful when clocking errors might insert 0-bits in front of a message, an alteration that would otherwise leave the check value unchanged.Usually, but not always, an implementation?appends?n?0-bits?(n?being the size of the CRC) to the bit stream to be checked before the polynomial division occurs. Such appending is explicitly demonstrated in the?Computation of CRC?article. This has the convenience that the remainder of the original bit stream with the check value appended is exactly zero, so the CRC can be checked simply by performing the polynomial division on the received bit stream and comparing the remainder with zero. Due to the associative and commutative properties of the exclusive-or operation, practical table driven implementations can obtain a result numerically equivalent to zero-appending without explicitly appending any zeroes, by using an equivalent, faster algorithm that combines the message bit stream with the stream being shifted out of the CRC register.Sometimes an implementation?exclusive-ORs a fixed bit pattern?into the remainder of the polynomial division.Bit order:?Some schemes view the low-order bit of each byte as "first", which then during polynomial division means "leftmost", which is contrary to our customary understanding of "low-order". This convention makes sense when?serial-port?transmissions are CRC-checked in hardware, because some widespread serial-port transmission conventions transmit bytes least-significant bit first.Byte order: With multi-byte CRCs, there can be confusion over whether the byte transmitted first (or stored in the lowest-addressed byte of memory) is the least-significant byte (LSB) or the most-significant byte (MSB). For example, some 16-bit CRC schemes swap the bytes of the check value.Omission of the high-order bit?of the divisor polynomial: Since the high-order bit is always 1, and since an?n-bit CRC must be defined by an (n?+ 1)-bit divisor which?overflows an?n-bit?register, some writers assume that it is unnecessary to mention the divisor's high-order bit.Omission of the low-order bit?of the divisor polynomial: Since the low-order bit is always 1, authors such as Philip Koopman represent polynomials with their high order bit intact, but without the low-order bit?{\displaystyle x^{0}}. This convention encodes the polynomial complete with its degree in one integer. These complications mean that there are three common ways to express a polynomial as an integer: the first two, which are mirror images in binary, are the constants found in code; the third is the number found in Koopman's papers.?In each case, one term is omitted.?So the polynomial?{\displaystyle x^{4}+x+1}?may be transcribed as:0x3 = 0b0011, representing?{\displaystyle x^{4}+(0x^{3}+0x^{2}+1x^{1}+1x^{0})}?(MSB-first code)0xC = 0b1100, representing?{\displaystyle (1x^{0}+1x^{1}+0x^{2}+0x^{3})+x^{4}}?(LSB-first code)0x9 = 0b1001, representing?{\displaystyle (1x^{4}+0x^{3}+0x^{2}+1x^{1})+x^{0}} (Koopman notation)In the table below they are shown as:Table 2: CRC Representations2.5 STANDARDS AND COMMOM USE Numerous varieties of cyclic redundancy checks have been incorporated into?technical standards. By no means have does one algorithm, or one of each degree suited every purpose; Koopman and Chakravarty recommend selecting a polynomial according to the application requirements and the expected distribution of message lengths.?The number of distinct CRCs in use has confused developers, a situation which authors have sought to address.[7]?There are three polynomials reported for CRC-12,?nineteen conflicting definitions of CRC-16, and seven of CRC-32. The polynomials commonly applied are not the most efficient ones possible. Since 1993, Koopman, Castagnoli and others have surveyed the space of polynomials between 3 and 64 bits in size, finding examples that have much better performance (in terms of?Hamming distance?for a given message size) than the polynomials of earlier protocols, and publishing the best of these with the aim of improving the error detection capacity of future standards.?In particular,?ISCSI?and?SCTP?have adopted one of the findings of this research, the CRC-32C (Castagnoli) polynomial. The design of the 32-bit polynomial most commonly used by standards bodies, CRC-32-IEEE, was the result of a joint effort for the?Rome Laboratory?and the Air Force Electronic Systems Division by Joseph Hammond, James Brown and Shyan-Shiang Liu of the?Georgia Institute of Technology?and Kenneth Brayer of the?MITRE Corporation. The earliest known appearances of the 32-bit polynomial were in their 1975 publications: Technical Report 2956 by Brayer for MITRE, published in January and released for public dissemination through?DTIC?in August,?and Hammond, Brown and Liu's report for the Rome Laboratory, published in May.?Both reports contained contributions from the other team. During December 1975, Brayer and Hammond presented their work in a paper at the IEEE National Telecommunications Conference: the IEEE CRC-32 polynomial is the generating polynomial of a?Hamming code?and was selected for its error detection performance.?Even so, the Castagnoli CRC-32C polynomial used in ISCSI or SCTP matches its performance on messages from 58 bits to 131 Kbits, and outperforms it in several size ranges including the two most common sizes of Internet packet. The?ITU-T?G.hn?standard also uses CRC-32C to detect errors in the payload (although it uses CRC-16-CCITT for?PHY headers).CRC32 computation is implemented in hardware as an operation of?SSE4.2?instruction set, first introduced in?Intel?processors'?Nehalem?micro architecture.CHAPTER 3LINEAR-FEEDBACK SHIFT REGISTER3.1 FIBONACCI LFSR In?computing, a?linear-feedback shift register?(LFSR) is a?shift register?whose input bit is a?linear function?of its previous state.The most commonly used linear function of single bits is?exclusive-or?(XOR). Thus, an LFSR is most often a shift register whose input bit is driven by the XOR of some bits of the overall shift register value. The initial value of the LFSR is called the seed, and because the operation of the register is deterministic, the stream of values produced by the register is completely determined by its current (or previous) state. Likewise, because the register has a finite number of possible states, it must eventually enter a repeating cycle. However, an LFSR with a?well-chosen feedback function?can produce a sequence of bits that appears random and has a?very long cycle. Applications of LFSRs include generating?pseudo-random numbers,?pseudo-noise sequences, fast digital counters, and?whitening sequences. Both hardware and software implementations of LFSRs are common. The mathematics of a?cyclic redundancy check, used to provide a quick check against transmission errors, are closely related to those of an LFSR. The bit positions that affect the next state are called the taps. In the diagram the taps are [16,14,13,11]. The rightmost bit of the LFSR is called the output bit. The taps are XOR'd sequentially with the output bit and then fed back into the leftmost bit. The sequence of bits in the rightmost position is called the output stream. The bits in the LFSR state that influence the input are called taps. A maximum-length LFSR produces an m-sequence (i.e., it cycles through all possible 2m ? 1 states within the shift register except the state where all bits are zero), unless it contains all zeros, in which case it will never change. As an alternative to the XOR-based feedback in an LFSR, one can also use XNOR.[2] This function is an affine map, not strictly a linear map, but it results in an equivalent polynomial counter whose state is the complement of the state of an LFSR. A state with all ones is illegal when using an XNOR feedback, in the same way as a state with all zeroes is illegal when using XOR. This state is considered illegal because the counter would remain "locked-up" in this state. The sequence of numbers generated by an LFSR or its XNOR counterpart can be considered a binary numeral system just as valid as Gray code or the natural binary code. The arrangement of taps for feedback in an LFSR can be expressed in finite field arithmetic as a polynomial mod 2. This means that the coefficients of the polynomial must be 1s or 0s. This is called the feedback polynomial or reciprocal characteristic polynomial. For example, if the taps are at the 16th, 14th, 13th and 11th bits (as shown), the feedback polynomial is The "one" in the polynomial does not correspond to a tap – it corresponds to the input to the first bit (i.e. x0, which is equivalent to 1). The powers of the terms represent the tapped bits, counting from the left. The first and last bits are always connected as an input and output tap respectively. The LFSR is maximal-length if and only if the corresponding feedback polynomial is primitive. This means that the following conditions are necessary (but not sufficient):The number of taps is even. The set of taps is set wise co-prime; i.e., there must be no divisor other than 1 common to all taps. Tables of primitive polynomials from which maximum-length LFSRs can be constructed are given below and in the references. There can be more than one maximum-length tap sequence for a given LFSR length. Also, once one maximum-length tap sequence has been found, another automatically follows. If the tap sequence in an n-bit LFSR is [n, A, B, C, 0], where the 0 corresponds to the x0 = 1 term, then the corresponding "mirror" sequence is [n, n ? C, n ? B, n ? A, 0]. So the tap sequence [32, 22, 2, 1, 0] has as its counterpart [32, 31, 30, 10, 0]. Both give a maximum-length sequence.Fig 3.1: A 16-bit Fibonacci LFSR3.2 GALOIS LFSRsNamed after the French mathematician??variste Galois, an LFSR in Galois configuration, which is also known as?modular,?internal XORs, or?one-to-many LFSR, is an alternate structure that can generate the same output stream as a conventional LFSR (but offset in time).?In the Galois configuration, when the system is clocked, bits that are not taps are shifted one position to the right unchanged. The taps, on the other hand, are XORed with the output bit before they are stored in the next position. The new output bit is the next input bit. The effect of this is that when the output bit is zero, all the bits in the register shift to the right unchanged, and the input bit becomes zero. When the output bit is one, the bits in the tap positions all flip (if they are 0, they become 1, and if they are 1, they become 0), and then the entire register is shifted to the right and the input bit becomes 1.To generate the same output stream, the order of the taps is the?counterpart?(see above) of the order for the conventional LFSR, otherwise the stream will be in reverse. Note that the internal state of the LFSR is not necessarily the same. The Galois register shown has the same output stream as the Fibonacci register in the first section. A time offset exists between the streams, so a different start point will be needed to get the same output each cycle.Galois LFSRs do not concatenate every tap to produce the new input (the XORing is done within the LFSR, and no XOR gates are run in serial, therefore the propagation times are reduced to that of one XOR rather than a whole chain), thus it is possible for each tap to be computed in parallel, increasing the speed of execution.In a software implementation of an LFSR, the Galois form is more efficient, as the XOR operations can be implemented a word at a time: only the output bit must be examined individually. Fig 3.2: A 16-bit Galois LFSR.3.3 MATRIX FORMS Binary LFSRs of both Fibonacci and Galois configurations can be expressed as linear functions using matrices{\displaystyle \mathbb {F} _{2}}.?Using the?companion matrix?of the characteristic polynomial of the LFSR and denoting the seed as a column vector{\displaystyle (a_{0},a_{1},\dots ,a_{n-1})^{\mathrm {T} }}, the state of the register in Fibonacci configuration after?{\displaystyle k}?steps is given by3.4 SOME POLYNOMIAL FOR MAXIMAL LFSR’S The following table lists maximal-length polynomials for shift-register lengths up to 24. Note that more than one maximal-length polynomial may exist for any given shift-register length. A list of alternative maximal-length polynomials for shift-register lengths 4–32 (beyond which it becomes unfeasible to store or transfer them) can be found here:?Table 3: General Polynomials3.5 OUTPUTS-STREAM PROPERTIESOnes and zeroes occur in "runs". The output stream 1110010, for example, consists of four runs of lengths 3, 2, 1, 1, in order. In one period of a maximal LFSR, 2n?1?runs occur (in the example above, the 3-bit LFSR has 4 runs). Exactly half of these runs are one bit long, a quarter are two bits long, up to a single run of zeroes?n???1 bits long, and a single run of ones?n?bits long. This distribution almost equals the statistical?expectation value?for a truly random sequence. However, the probability of finding exactly this distribution in a sample of a truly random sequence is rather low.LFSR output streams are?deterministic. If the present state and the positions of the XOR gates in the LFSR are known, the next state can be predicted.?This is not possible with truly random events. With maximal-length LFSRs, it is much easier to compute the next state, as there are only an easily limited number of them for each length.The output stream is reversible; an LFSR with mirrored taps will cycle through the output sequence in reverse order.3.6 APPLICATIONS LFSRs can be implemented in hardware, and this makes them useful in applications that require very fast generation of a pseudo-random sequence, such as?direct-sequence spread spectrum?radio. LFSRs have also been used for generating an approximation of?white noise?in various?programmable sound generators.3.6.1 USES AS COUNTERS The repeating sequence of states of an LFSR allows it to be used as a?clock divider?or as a counter when a non-binary sequence is acceptable, as is often the case where computer index or framing locations need to be machine-readable.[7]?LFSR?counters?have simpler feedback logic than natural binary counters or?Gray-code counters, and therefore can operate at higher clock rates. However, it is necessary to ensure that the LFSR never enters an all-zeros state, for example by presetting it at start-up to any other state in the sequence. The table of primitive polynomials shows how LFSRs can be arranged in Fibonacci or Galois form to give maximal periods. One can obtain any other period by adding to an LFSR that has longer period some logic that shortens the sequence by skipping some states.3.6.2 USES IN CRYPTOGRAPHY LFSRs have long been used as?pseudo-random number generators?for use in?stream ciphers?(especially in?military?cryptography), due to the ease of construction from simple?electromechanical?or?electronic circuits, long?periods, and very uniformly?distributed?output streams. However, an LFSR is a linear system, leading to fairly easy?cryptanalysis. For example, given a stretch of?known plaintext and corresponding cipher text, an attacker can intercept and recover a stretch of LFSR output stream used in the system described, and from that stretch of the output stream can construct an LFSR of minimal size that simulates the intended receiver by using the?Berlekamp-Massey algorithm. This LFSR can then be fed the intercepted stretch of output stream to recover the remaining plaintext.Three general methods are employed to reduce this problem in LFSR-based stream ciphers:Non-linear?combination of several?bits?from the LFSR?state;Non-linear combination of the output bits of two or more LFSRs (see also:?shrinking generator); or using?Evolutionary algorithm?to introduce non-linearity.[8]Irregular clocking of the LFSR, as in the?alternating step generator.3.6.3 TEST-PATTERN GENERATION Complete LFSR are commonly used as pattern generators for exhaustive testing, since they cover all possible inputs for an?n-input circuit. Maximal-length LFSRs and weighted LFSRs are widely used as pseudo-random test-pattern generators for pseudo-random test applications.3.6.4 SIGNATURE ANALYSIS In?built-in self-test?(BIST) techniques, storing all the circuit outputs on chip is not possible, but the circuit output can be compressed to form a signature that will later be compared to the golden signature (of the good circuit) to detect faults. Since this compression is lossy, there is always a possibility that a faulty output also generates the same signature as the golden signature and the faults cannot be detected. This condition is called error masking or aliasing. BIST is accomplished with a multiple-input signature register (MISR or MSR), which is a type of LFSR. A standard LFSR has a single XOR or XNOR gate, where the input of the gate is connected to several "taps" and the output is connected to the input of the first flip-flop. A MISR has the same structure, but the input to every flip-flop is fed through an XOR/XNOR gate. For example, a 4-bit MISR has a 4-bit parallel output and a 4-bit parallel input. The input of the first flip-flop is XOR/XNORd with parallel input bit zero and the "taps". Every other flip-flop input is XOR/XNORd with the preceding flip-flop output and the corresponding parallel input bit. Consequently, the next state of the MISR depends on the last several states opposed to just the current state. Therefore, a MISR will always generate the same golden signature given that the input sequence is the same every time.3.6.5 USES IN DIGITAL BROADCASTING To prevent short repeating sequences (e.g., runs of 0s or 1s) from forming spectral lines that may complicate symbol tracking at the receiver or interfere with other transmissions, the data bit sequence is combined with the output of a linear-feedback register before modulation and transmission. This scrambling is removed at the receiver after demodulation. When the LFSR runs at the same?bit rate?as the transmitted symbol stream, this technique is referred to as?scrambling. When the LFSR runs considerably faster than the symbol stream, the LFSR-generated bit sequence is called?chipping code. The chipping code is combined with the data using?exclusive or?before transmitting using?binary phase-shift keying?or a similar modulation method. The resulting signal has a higher bandwidth than the data, and therefore this is a method of?spread-spectrum?communication. When used only for the spread-spectrum property, this technique is called?direct-sequence spread spectrum; when used to distinguish several signals transmitted in the same channel at the same time and frequency, it is called?code division multiple access.Neither scheme should be confused with?encryption?or?encipherment; scrambling and spreading with LFSRs do?not?protect the information from eavesdropping. They are instead used to produce equivalent streams that possess convenient engineering properties to allow robust and efficient modulation and demodulation.Digital broadcasting systems that use linear-feedback registers:ATSC Standards?(digital TV transmission system – North America)DAB?(Digital Audio Broadcasting?system – for radio)DVB-T?(digital TV transmission system – Europe, Australia, parts of Asia)NICAM?(digital audio system for television)Other digital communications systems using LFSRs:INTELSAT business service (IBS)Intermediate data rate (IDR)SDI?(Serial Digital Interface transmission)Data transfer over?PSTN?(according to the?ITU-T?V-series recommendations)CDMA?(Code Division Multiple Access) cellular telephony100BASE-T2 "fast" Ethernet?scrambles bits using an LFSR1000BASE-T Ethernet, the most common form of Gigabit Ethernet, scrambles bits using an LFSRPCI Express?3.0SATACHAPTER 4PROPOSED METHOD4.1 INTRODUCTION In this section, we present the architecture of the proposed concurrent Error detection of parallel CRC structure based on multiple-bit parity prediction. We have already formulated multiple-bit parity prediction expressions for l >m when l is the number of input message bits in each iteration and m is the number of CRC bits.Fig 4.1: Simplified block diagram of the proposed multiple-bit parity-based error detection structure of parallel CRC architecture. Considering the parallel CRC architecture proposed in to apply a multiple-bit parity approach on this architecture to achieve concurrent fault detection. This multiple- bit parity checking circuit will compare predicted parity and actual parity of the syndrome bits at the end of each iteration and raise an alarm if they do not match. The output of the parity calculation unit is the actual multiple-bit parity of CRC syndrome bits after nth iteration, whereas the output of the parity prediction unit is the predicted multiple-bit parity. The actual parity is obtained by XORing the outputs of the CRC Unit, whereas the predicted parity is evaluated as a different function of inputs.4.2 PARITY Parity is the simplest form of error checking. It adds one bit to the pattern and then requires that the modulo-2 sum of all the bits of the pattern and the parity bit have a defined answer. The answer is 0 for even parity and 1 for odd parity. An alternative way of making the same statement is that odd(even) parity constrains there to be an odd(even) number of “1"s in the pattern plus parity bit. Parity bits are sufficient to catch all single errors in the pattern plus parity bit as this will change a single 1 to a 0 or vice versa and therefore upset the parity calculation. A repeat of the parity calculation at any time will reveal this problem. However the system will not catch any double errors (except the trivial case of two errors on the same bit) and these will be flagged as valid. For example a pattern of 0110100 becomes 00110100 with the addition of a parity bit and enforcement of odd parity. A single error would ( for example) change the pattern to 00110000 which has the wrong parity. However a further error to 10110000 looks OK - but is in fact wrong. The length of pattern which each parity bit “guards” should be as long as possible so that the parity bit takes up the least amount of extra storage, but not so long that a double error becomes likely. Thus if the probability of a single bit being is error is 10 , then the -6 probability of an error in an 8-bit pattern (7 + parity) is about 8 x 10 and the probability -6 of a double error is about 3 x 10 which is quite small. Fig 4.2: Parity Prediction circuit of each block of the proposed multi bitscheme for l < m.Table3: List of the Symbols UsedCHAPTER 5RESULTS5.1 SIMULATION RESULTS16-bit Generator Polynomial: X16+X15+X2+18-bit Message Bits: 8’b0000000116-bit Theoretical CRC: 16’b0111110100000111Fig 5.1 RTL SCHEMATIC FOR TOP MODULEFig.5.2 INTERNAL STRUCTURE FOR TOP MODULEFig. 5.3 SIMULATION RESULT WITHOUT INTRODUCING ERRORFig. 5.4 SIMULATION RESULT WITH INTRODUCING ERROR5.2 SYNTHESIS REPORTThe design has been synthesized by using genus compiler in 90nm and 45nm technology. The results for leakage power, Dynamic power, total Power and area have been noted down and compared. The comparison results shown in table are low for 45nm technology and is best suited for chip fabrication.Leakage power (?W) Dynamic power(?W) Total power(?W) Area(microns) Time(ns) 90 nm 4.629 544.14 548.779 911.308 1.4 45 nm 0.018 312.003 312.00 0.9*10^-3 1.3 Table 4: Comparison table for power and areaSerial CRC Parallel CRC Area (microns) 585 529 Total power(nW) 108005.943 57212.539 Timing slack(pS) 4809 15130 Table 5: Comparison of serial CRC and Parallel CRCIt is evident from the above table that the area, power is reduced and the timing slack is improved to a great extent in 90nm technology. The parallel CRC implementation employing multi-output LFSR is better.CHAPTER 6CONCLUSIONS A concurrent error detection design structure for parallel CRC architecture is been proposed. Two different cases, namely l > m and l < m have been considered and the formulation for the multiple-bit parity-based concurrent fault detection scheme has been derived followed by the implementation of the corresponding architecture for each of them. The error and fault coverage of the proposed scheme have been analyzed in detail using error and fault models. We have also presented a formulation for the error detection capability of the scheme covering all the possible 2m-1 case where the output can become erroneous. Moreover, a thorough analysis of all the possible single stuck-at fault locations has been discussed. Furthermore, Investigated the relation between stuck-at fault location and error propagation resulting in an erroneous output and the relation between this propagation with the CRC generator polynomial. Finally, the fault detection capability and the theoretical time and area overheads of the proposed schemes have been reported and confirmed by software simulation and ASIC implementation results. We have coded the proposed scheme in C for software simulation and in Verilog for ASIC implementation purpose. The proposed method is shown to be area efficient and at the same time it has a high fault detection capability. The proposed multiple-bit parity prediction scheme has also been shown to have not only the ability to detect a fault, but also to point out the region of the fault. This potential advantage can further be utilized while extending the proposed scheme to correct the error online in future, i.e., to achieve an error free output even in the presence of a hardware error.REFERENCES1. W. Peterson and D. Brown, “Cyclic codes for error detection,” Proc. IRE, vol. 49, no. , pp. 228–235, 1961.2. ATM Layer Specification, ITU-T Recommendation I.361, 1999.3. IEEE Standard for Information Technology: Carrier Sense Multiple Access with Collision Detection (CSMA/CD) Access Method and Physical Layer Specifications, ANSI/IEEE Std 802.3-2005.4. Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications, ANSI/IEEE Std 802.11-1999.5. IEEE Standard for Local and Metropolitan Area Networks: Air Interface for Fixed and Mobile Broadband Wireless Access Systems, ANSI/IEEE Std. 802.16-2004.6. C. Kennedy, “High performance hardware and software implementations of the cyclic redundancy check computation,” PhD Thesis, University of Western Ontario, London, ON, 2009.7. S. Lin and D.J. Costello, Error Control Coding. Englewood Cliffs, NJ: Prentice-Hall, 1983.8. T. Ramabadran and S. Gaitonde, “A tutorial on CRC computations,” IEEE Micro, vol. no. 4, pp. 62–75, Aug. 1988.9. T.-B. Pei and C. Zukowski, “High-speed parallel CRC circuits in VLSI,” IEEE Trans. Commun., vol. 40, no. 4, pp. 653–657, Apr. 1992.10. G. Albertengo and R. Sisto, “Parallel CRC generation,” IEEE Micro, vol. 10, no. 5, pp. 63–71, Oct. 1990.11. G. Campobello, G. Patane, and M. Russo, “Parallel CRC realization,” IEEE Trans. Comput., vol. 52, no. 10, pp. 1312–1319, Oct. 2003.12. M.-D. Shieh, M.-H. Sheu, C.-H. Chen, and H.-F. Lo, “A systematic approach for parallel CRC computations,” J. Inf. Sci. Eng., vol. 17, no. 3, pp. 445–461, 2001.13. M. Sprachmann, “Automatic generation of parallel CRC circuits,” IEEE Des. Test Comput., vol. 18, no. 3, pp. 108–114, May/Jun. 2001.14. C. Cheng and K. K. Parhi, “High-speed parallel CRC implementation based on unfolding, pipelining, and retiming,” IEEE Trans. Circuits Syst. II, Express Briefs, vol. 53, no. 10, pp. 1017–1021, Oct. 2006.15. C. Cheng and K. K. Parhi, “High speed VLSI architecture for general linear feedback shift register (LFSR) structures,” in Proc. IEEE 43rd Asilomar Conf. Signals, Syst. Comput., 2009, pp. 713–717.16. M. Ayinala and K. K. Parhi, “High-speed parallel architectures for linear feedback shift registers,” IEEE Trans. Signal Process., vol. 59, no. 9, pp. 4459–4469, Sep. 2011.17. C. Kennedy and A. Reyhani-Masoleh, “High-speed parallel CRC circuits,” in Proc. 42nd Asilomar Conf. Signals, Syst. Comput., Oct. 2008, pp. 1823–1829.18. M. Braun, J. Friedrich, T. Grn, and J. Lembert, “Parallel CRC computation in FPGAs” in Proc. 6th Int. Workshop Field-Programm. Logic Appl., 1996, vol. 1142, pp. 156–165.19. R. Glaise, “A two-step computation of cyclic redundancy code CRC-32 for ATM networks,” IBM J. Res. Develop., vol. 41, no. 6, pp. 705–709, 1997.20. J. Derby, “High-speed CRC computation using state-space transformations,” in Proc. IEEE Global Telecommun. Conf., 2001, vol. 1, pp. 166–170.21. P. Reviriego, S. Pontarelli and J. A. Maestro, “Concurrent error detection for orthogonal latin squares encoders and syndrome computation,” IEEE Trans. Very Large Scale Integration Syst., vol. 21, no 12, pp. 2334–2338, Dec. 2013. ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download