How Computers Work (EMMA) Orientation



HS Computer Hardware Week #14

Random Access Memory – (RAM)

Review – How RAM Works

Dynamic RAM

Memory Refresh

CAS

RAS

SDRAM Latency –Takes time to access data after strobing column & row

tCAS - Column Address Strobe Latency

tRCD - RAS to CAS Delay

tRP - RAS Precharge

tRAS - Row Address Strobe Latency

DDR SDRAM

Double Data Rate SDRAM - The interface uses double pumping (transferring data on both the rising and falling edges of the clock signal) to lower the clock frequency. Higher transfer rates are possible by more strict control of the timing of the electrical data and clock signals

DDR2 SDRAM

DDR2 has higher latencies than DDR due to double clock of bus w/double pumping

DDR2 runs at higher bus speeds which equates to an overall increase in throughput

DDR2 is not backwards compatible DDR (Notch offset on DDR2)

Faster DDR2 compatible with slower DDR2 (Bus runs at the slower speed)

Typical latency 5-5-5-15

DDR3 SDRAM

The primary benefit of DDR3 is the ability to transfer at twice the data rate of DDR2

DDR3 memory provides a reduction in power consumption of 30% compared to DDR2

Prefetch Buffer quick and easy access to multiple datawords located on a common physical row in the memory. When a memory access occurs to a row, the buffer grabs a set of adjacent datawords on the row and reads then out (“bursts” them) in rapid-fire sequence. Then hope that CPU needs the adjacent data.

Typical latency 7-7-7-20

Manufacturer’s Product Page via

Documentation Online – DDR2, DDR3 SDRAM & SDRAM Latency plus Latency Supplements, How RAM Works, PreFetch

Homework Online - RAM Quiz

Newegg Wishlist – RAM Selection

How RAM Works

by Jeff Tyson and Dave Coustan

Random access memory (RAM) is the best known form of computer memory. RAM is considered "random access" because you can access any memory cell directly if you know the row and column that intersect at that cell.

The opposite of RAM is serial access memory (SAM). SAM stores data as a series of memory cells that can only be accessed sequentially (like a cassette tape). If the data is not in the current location, each memory cell is checked until the needed data is found. SAM works very well for memory buffers, where the data is normally stored in the order in which it will be used (a good example is the texture buffer memory on a video card). RAM data, on the other hand, can be accessed in any order.

Dynamic RAM

Similar to a microprocessor, a memory chip is an integrated circuit (IC) made of millions of transistors and capacitors. In the most common form of computer memory, dynamic random access memory (DRAM), a transistor and a capacitor are paired to create a memory cell, which represents a single bit of data. The capacitor holds the bit of information -- a 0 or a 1. The transistor acts as a switch that lets the control circuitry on the memory chip read the capacitor or change its state.

A capacitor is like a small bucket that is able to store electrons. To store a 1 in the memory cell, the bucket is filled with electrons. To store a 0, it is emptied. The problem with the capacitor's bucket is that it has a leak. In a matter of a few milliseconds a full bucket becomes empty. Therefore, for dynamic memory to work, either the CPU or the memory controller has to come along and recharge all of the capacitors holding a 1 before they discharge. To do this, the memory controller reads the memory and then writes it right back. This refresh operation happens automatically thousands of times per second.

[pic]

This refresh operation is where dynamic RAM gets its name. Dynamic RAM has to be dynamically refreshed all of the time or it forgets what it is holding. The downside of all of this refreshing is that it takes time and slows down the memory.

Memory cells are etched onto a silicon wafer in an array of columns (bitlines) and rows (wordlines). The intersection of a bitline and wordline constitutes the address of the memory cell.

[pic]

DRAM works by sending a charge through the appropriate column (CAS) to activate the transistor at each bit in the column. When writing, the row lines contain the state the capacitor should take on. When reading, the sense-amplifier determines the level of charge in the capacitor. If it is more than 50 percent, it reads it as a 1; otherwise it reads it as a 0. The counter tracks the refresh sequence based on which rows have been accessed in what order. The length of time necessary to do all this is so short that it is expressed in nanoseconds (billionths of a second). A memory chip rating of 70ns means that it takes 70 nanoseconds to completely read and recharge each cell.

Memory cells alone would be worthless without some way to get information in and out of them. So the memory cells have a whole support infrastructure of other specialized circuits. These circuits perform functions such as:

• Identifying each row and column (row address select and column address select)

• Keeping track of the refresh sequence (counter)

• Reading and restoring the signal from a cell (sense amplifier)

• Telling a cell whether it should take a charge or not (write enable)

Other functions of the memory controller include a series of tasks that include identifying the type, speed and amount of memory and checking for errors.

Static RAM

Static RAM uses a completely different technology. In static RAM, a form of flip-flop holds each bit of memory (see How Boolean Logic Works for details on flip-flops). A flip-flop for a memory cell takes four or six transistors along with some wiring, but never has to be refreshed. This makes static RAM significantly faster than dynamic RAM. However, because it has more parts, a static memory cell takes up a lot more space on a chip than a dynamic memory cell. Therefore, you get less memory per chip, and that makes static RAM a lot more expensive.

Static RAM is fast and expensive, and dynamic RAM is less expensive and slower. So static RAM is used to create the CPU's speed-sensitive cache, while dynamic RAM forms the larger system RAM space.

Memory chips in desktop computers originally used a pin configuration called dual inline package (DIP). This pin configuration could be soldered into holes on the computer's motherboard or plugged into a socket that was soldered on the motherboard. This method worked fine when computers typically operated on a couple of megabytes or less of RAM, but as the need for memory grew, the number of chips needing space on the motherboard increased.

The solution was to place the memory chips, along with all of the support components, on a separate printed circuit board (PCB) that could then be plugged into a special connector (memory bank) on the motherboard. Most of these chips use a small outline J-lead (SOJ) pin configuration, but quite a few manufacturers use the thin small outline package (TSOP) configuration as well. The key difference between these newer pin types and the original DIP configuration is that SOJ and TSOP chips are surface-mounted to the PCB. In other words, the pins are soldered directly to the surface of the board, not inserted in holes or sockets.

Memory chips are normally only available as part of a card called a module. You've probably seen memory listed as 8x32 or 4x16. These numbers represent the number of the chips multiplied by the capacity of each individual chip, which is measured in megabits (Mb), or one million bits. Take the result and divide it by eight to get the number of megabytes on that module. For example, 4x32 means that the module has four 32-megabit chips. Multiply 4 by 32 and you get 128 megabits. Since we know that a byte has 8 bits, we need to divide our result of 128 by 8. Our result is 16 megabytes!

System RAM

System RAM speed is controlled by bus width and bus speed. Bus width refers to the number of bits that can be sent to the CPU simultaneously, and bus speed refers to the number of times a group of bits can be sent each second. A bus cycle occurs every time data travels from memory to the CPU. For example, a 100-MHz 32-bit bus is theoretically capable of sending 4 bytes (32 bits divided by 8 = 4 bytes) of data to the CPU 100 million times per second, while a 66-MHz 16-bit bus can send 2 bytes of data 66 million times per second. If you do the math, you'll find that simply changing the bus width from 16 bits to 32 bits and the speed from 66 MHz to 100 MHz in our example allows for three times as much data (400 million bytes versus 132 million bytes) to pass through to the CPU every second.

In reality, RAM doesn't usually operate at optimum speed. Latency changes the equation radically. Latency refers to the number of clock cycles needed to read a bit of information. For example, RAM rated at 100 MHz is capable of sending a bit in 0.00000001 seconds, but may take 0.00000005 seconds to start the read process for the first bit. To compensate for latency, CPUs uses a special technique called burst mode.

Burst mode depends on the expectation that data requested by the CPU will be stored in sequential memory cells. The memory controller anticipates that whatever the CPU is working on will continue to come from this same series of memory addresses, so it reads several consecutive bits of data together. This means that only the first bit is subject to the full effect of latency; reading successive bits takes significantly less time. The rated burst mode of memory is normally expressed as four numbers separated by dashes. The first number tells you the number of clock cycles needed to begin a read operation; the second, third and fourth numbers tell you how many cycles are needed to read each consecutive bit in the row, also known as the wordline. For example: 5-1-1-1 tells you that it takes five cycles to read the first bit and one cycle for each bit after that. Obviously, the lower these numbers are, the better the performance of the memory.

Burst mode is often used in conjunction with pipelining, another means of minimizing the effects of latency. Pipelining organizes data retrieval into a sort of assembly-line process. The memory controller simultaneously reads one or more words from memory, sends the current word or words to the CPU and writes one or more words to memory cells. Used together, burst mode and pipelining can dramatically reduce the lag caused by latency.

So why wouldn't you buy the fastest, widest memory you can get? The speed and width of the memory's bus should match the system's bus. You can use memory designed to work at 100 MHz in a 66-MHz system, but it will run at the 66-MHz speed of the bus so there is no advantage, and 32-bit memory won't fit on a 16-bit bus.

Even with a wide and fast bus, it still takes longer for data to get from the memory card to the CPU than it takes for the CPU to actually process the data. That's where caches come in.

Latencies

|DDR2 memories work with higher latencies than DDR memories. In other words, they delay more clock cycles to deliver a requested data. Does this mean that DDR2 |

|memories are slower than DDR memories? Not necessarily. As we said, they delay more clock cycles, but not necessarily more time. |

|If we compare a DDR memory to a DDR2 memory running under the same clock, the one with the lower latency will be the fastest. Thus, if you have a DDR400 CL3 |

|memory and a DDR2-400 CL4 memory, your DDR400 will be faster. |

|Keep in mind that DDR2 memories have an additional parameter called AL (additional latency), which must be added to their nominal latency (CL) in order to get the|

|total latency. |

|When comparing memories with different speeds, you need to consider the clock in your math. |

|On a DDR400 CL3 memory, this “3” means that the memory delays three clock cycles to start delivering the requested data. Since this memory runs at 200 MHz, each |

|clock tick measures 5 ns (T= 1/f). Thus its latency if of 15 ns. |

|Now on a DDR2-533 CL3 AL0 memory, this “3” also means that the memory delays three clock cycles to start delivering the request data, but since this memory runs |

|at 266 MHz, each clock tick measures 3.75 ns, so its latency is of 11.25 ns, making this memory faster to data delivery than our DDR400 CL3 memory. So a DDR2-533 |

|CL4 and AL0 memory has the same latency as a DDR400 CL3. Notice that we are assuming the additional latency as zero, or we would need to take it into account, |

|i.e., a DDR2 CL3 AL1 memory has in reality a latency of four clock cycles. |

|Some manufacturers announce their memory module latencies thru a series of four number, like “4-4-4-12” or “5-4-4-9” or “3-3-3-8”. The latency we’ve been talking |

|about (CL) is the first number on the sequence. The additional latency (AL) is usually found on the memory module technical specs (usually a PDF file for |

|downloading on the manufacturer website). You want to know what the other numbers mean, read our tutorial Understanding DDR Memories. |

|In order to make you calculations and comparisons easier, we prepared the following table containing the clock tick duration depending on the memory type. So, |

|just get the number below depending on the memory type you want to compare and multiply it by the latency value in order to know the latency duration in |

|nanoseconds, allowing you to compare latencies of memories with different clock speeds and to know which memory is faster. |

| |

|Memory |

|Clock Tick Duration (each one) |

| |

|DDR266 |

|7.5 ns |

| |

|DDR333  |

|6 ns |

| |

|DDR400 and DDR2-400 |

|5 ns |

| |

|DDR2-533 |

|3.75 ns |

| |

|DDR2-667 |

|3 ns |

| |

|DDR2-800  |

|2.5 ns |

| |

RAM and Latency - What you need to Know

By Nicholas Spriggs

No one likes a slow computer. Well, at least not anyone I know. People spend thousands of dollars on upgrades and cutting edge systems to use a faster computer. However, upgrading your computer is not the only way to have a quicker computing experience. For those looking to squeeze the last bit of performance out of their machines, there are many ways to optimize your system for speed.

Overclocking your CPU, tweaking your registry, and fine tuning the bios are some the most common ways to optimize your system. Another often overlooked method is reducing your RAM’s latency.

RAM latency occurs when the CPU needs to retrieve information from memory. In order to receive information from RAM, the CPU sends out a request through the front side bus (FSB.) However, the CPU operates faster than the memory, so it must wait while the proper segment of memory is located and read, before the data can be sent back.

RAM latency is measured in wasted FSB clock cycles, since the data is transferred through the FSB. The bigger the latency number, the more FSB clock cycles it missed. The goal in reducing latency is to get the data back to the CPU in the least amount of FSB clock cycles possible.

The easiest way to reduce RAM latency is to increase the speed of the front side bus. This means that the FSB can send and receive data between the CPU and memory faster. However, this also overclocks the CPU, RAM, and possibly the AGP bus as well.

Overclocking your system will void your computer’s warranties and could possibly damage and/or destroy your system, so only attempt it if you’re willing to risk frying your computer. Adjusting your PC’s FSB is usually performed through the BIOS or through jumpers on your motherboard, although not all motherboards support overclocking.

A safer method is to adjust your RAM’s timings, although this can still potentially damage your system and usually only wields nominally performance gains. There’s no simple way to say it, so I’ll apologize in advance for spitting some tech jargon your way.

RAM timing are measured in CAS, RCD, RP, and RAS. CAS refers to the amount of clock cycles to reach the correct column of memory, RCD refers to the amount of cycles between RAS to CAS, RP refers to the amount of cycles needed to close a row and open the next row for reading, and RAS refers to the smallest number of clock cycles a row must be actively accessed. To simplify that explanation, remember that RAM timings are measured in FSB clock cycles, so the lower the number, the faster your system is.

For example, my RAM timings are 3-3-3-8 (CAS-RCD-RP-RAS.) To optimize my timings, I first tried lowering my CAS in my bios to 2.5 and rebooting. Windows booted just fine and everything worked correctly, so I then rebooted and went back into my bios and dropped my CAS down to 2, so now my timings were 2-3-3-8. Again, this setup seemed stable so I went in and tried reducing my RAS, since it was pretty high. This was the last stable tweak that I could do. When I tried to go below 2-3-3-7 in any timing my system either wouldn’t boot or Windows would generate a mass amount of memory related areas.

I used 3dMark 2005 to get an idea of how much improvement my memory adjustments made, if any. Before tweaking my timings I posted a 2105 3dMark score, after the tweaking I scored 2114. Not much of difference, but when it comes to optimizing your system for speed, every last bit helps.

While reducing RAM latency may not have a huge impact on system performance, it can give it a little extra kick, which combined with other methods of optimization, can result in a much quicker PC. So until you’re ready to buy a new computer, consider tuning up your current one to it’s fullest potential.

SDRAM latency

From Wikipedia, the free encyclopedia

SDRAM latency refers to the delays incurred when a computer tries to access data in SDRAM. SDRAM latency is often measured in memory bus clock cycles. Because a modern CPU is much faster than SDRAM, the CPU has to wait for a relatively long time for a memory access to complete before it can process the data. SDRAM latency contributes to total memory latency, which causes a significant bottleneck for system performance in modern computers.

SDRAM access

SDRAM is graphically organized into a grid like pattern, with "rows", and "columns". The data stored in SDRAM comes in blocks, defined by the coordinates of the row and column of the specific information. The steps for the memory controller to access data in SDRAM follow in order:

1. First, the SDRAM is in an idle state.

2. The controller issues the "Active" command. It activates a certain row, as indicated by the address lines, in the SDRAM chip for accessing. This command typically takes a few clock cycles.

3. After the delay, column address and either "Read" or "Write" command is issued. Typically the read or write command can be repeated every clock cycle for different column addresses (or a burst mode read can be performed). The read data isn't however available until a few clock cycles later, because the memory is pipelined.

4. When an access is requested to another row, the current row has to be deactivated by issuing the "Precharge" command. The precharge command takes a few clock cycles before a new "Active" command can be issued.

SDRAM access has four main measurements (quantified in FSB clock cycles) important in defining the SDRAM latency in a given computer (the 't' prefixes are for 'time'):

tCAS

The number of clock cycles needed to access a certain column of Data in SDRAM. CAS Latency, or simply CAS, is known as Column Address Strobe Latency, sometimes referred to as tCL.

tRCD (RAS to CAS Delay)

The number of Clock cycles needed between a Row Address Strobe (RAS) and a CAS. It is the time required between the computer defining the row and column of the given memory block and the actual read or write to that location. Stands for Row address to Column address Delay.

tRP (RAS Precharge)

The number of clock cycles needed to terminate access to an open row of memory, and open access to the next row. Stands for Row precharge time.

tRAS

The minimum number of clock cycles needed to access a certain row of data in RAM between the data request and the precharge command. Known as Active to Precharge Delay. Historically, tRAS was defined as the time needed to establish the necessary potential between a bitline pair within the memory array until it was safe to write back the data to the memory cells of origin after a (destructive) read. Pay attention to the word read here. Memory, in many ways is like a book, you can only read after opening a book to a certain page and paragraph within that particular page. The RAS Pulse Width is the time until a page can be closed again. Therefore, just by definition, the minimum tRAS must be the RAS-to-CAS delay plus the read latency (CAS delay). That is fine for FPM and EDO memory with their single word data transfers. With SDRAM, memory controllers started to output a chain of four consecutive quadwords on every access. With DDR, that number has increased to eight quadwords that effectively are two consecutive bursts of four. Now imagine someone closes the book you are reading from in the middle of a sentence. Right in your face! And does it over and again. This is what happens if tRAS is set too short. So here is the really simple calculation: The second burst of four has at least to be initiated and prefetched into the output buffers (like you get a glimpse at the headline in a book) before you can close the page without losing all information. That means that the minimum tRAS would be tRCD+CAS latency + 2 cycles (to output the first burst of four and make way for the second burst in the output buffers). Any tRAS setting lower tRCD + CAS + 2 cycles will allow the memory controller to close the page “in your face!” over and again and that will cause a performance hit because of a truncated transfer that needs to be repeated. Along with those hassles comes the self-explanatory risk for data corruption.

Measurements

As with almost all latency issues, the lower, the better. RAM speeds are given by the four numbers above, in the format "tCAS-tRCD-tRP-tRAS". So, for example, latency values given as 2.5-3-3-5 would indicate tCAS=2.5, tRCD=3, tRP=3, tRAS=5. (Note that .5 values of latency (such as 2.5) are only possible in Double data rate RAM, where two parts of each clock cycle are used)

Most computer users don't need to worry about SDRAM latency, because the computer can handle the auto-adjustment to RAM timing based on the Serial Presence Detect (SPD) ROM inside the RAM packaging that defines the four timing values, decided by the RAM manufacturer. Although the SDRAM latency timing can be adjusted manually, using lower latency settings than the module's rating (overclocking) may cause a computer to crash or fail to boot.

Ups and Downs: Memory Timings Put to the Test

Patrick Schmid, Bert Töpelt

January 19, 2004 12:00

RAM Games: Zooming In On Timings

It's a fact that modern systems need just scads of memory - 512 MB at least, although 1 GB isn't exactly rare anymore, either. Things get a bit more complicated once you stroll down to your local computer store, where they have a huge selection of DDR400 RAM modules from innumerable vendors in all imaginable variations. So what should you look out for? Should you really listen to what the friendly salesperson has to say?

In any discussion of RAM, somebody is bound to drop the term "CAS latency", or CL for short. But there are a slew of other factors that also affect how fast your RAM is. In this article, we'll take a closer look at these factors and explain the concepts behind the cryptic numbers given to the different modules.

Then we'll move along to the real purpose of this article - determining how a given system will perform on best-case, average and worst-case memory timings. We ran 19 individual benchmarks on all the available platforms (Athlon XP, Athlon 64, Athlon 64 FX, Pentium 4, Pentium 4 EE) in order to get you the dirt on the timings.

[pic]

How SDRAM Works

State-of-the-art RAM modules generally transfer data in 64 bit chunks. They contain DRAM chips that send data synchronously with the clock pulse signal and generally use the double-data-rate method (DDR). The difference between DDR and SDR-SDRAMs is that the DDR modules transfer data during both the rising and the falling edges of the clock pulse. That means that DDR400 RAM really only sends data at 200 MHz using the DDR method.

A better measure of memory speed is the module's cycle time, which is the amount of time needed to complete one clock cycle. A cycle time of 10 ns means that 100 million cycles are possible per second, and the chips run at up to 100 MHz. To reach 133 MHz, you need 7.5 ns; for 166 MHz, 6.0 ns.

|Cycle Time T |Max. Frequency f |Bandwidth SDR* |Bandwidth DDR* |

|10 ns |100 MHz |800 MB/s (PC100) |1,600 MB/s (DDR200) |

|7.5 ns |133 MHz |1,064 MB/s (PC133) |2,100 MB/s (DDR266) |

|6 ns |166 MHz |- |2,700 MB/s (DDR333) |

|5 ns |200 MHz |- |3,200 MB/s (DDR400) |

* Here's how to calculate bandwidth: frequency x interface width (64 bits are 8 Bytes). DDR RAM offers twice the transfer rate of SDR RAM.

Nomenclature: RAM Names

The name game was a lot easier with conventional SDR-SDRAM, which was simply named for the clock speed (PC100, PC133 SDRAM). The rules changed with the advent of DDR RAM. The modules are now titled using the maximum bandwidth (in MB/s). So PC2100 is DDR266, PC2700 is DDR 333, etc. This sea change was based on the nomenclature used for Rambus DRAM (RDRAM), whose names - PC800 or PC1066 - were also derived from their frequency. The following table provides more information.

|Name |TypeName: |Effective Clock Speed |Data Bus |Bandwidth |

|PC66 |SDRAM |66 MHz |64 Bit |0.5 GB/s |

|PC100 |SDRAM |100 MHz |64 Bit |0.8 GB/s |

|PC133 |SDRAM |133 MHz |64 Bit |1.06 GB/s |

|PC1600 |DDR200 |100 MHz |64 Bit |1.6 GB/s |

|PC1600 |Dual-DDR200 |100 MHz |2 x 64 Bit |3.2 GB/s |

|PC2100 |DDR266 |133 MHz |64 Bit |2.1 GB/s |

|PC2100 |Dual-DDR266 |133 MHz |2 x 64 Bit |4.2 GB/s |

|PC2700 |DDR333 |166 MHz |64 Bit |2.7 GB/s |

|PC2700 |Dual-DDR333 |166 MHz |2 x 64 Bit |5.4 GB/s |

|PC3200 |DDR400 |200 MHz |64 Bit |3.2 GB/s |

|PC3200 |Dual-DDR400 |200 MHz |2x 64 Bit |6.4 GB/s |

|PC4200 |DDR533 |266 MHz |64 Bit |4.2 GB/s |

|PC4200 |Dual-DDR533 |266 MHz |2 x 64 Bit |8.4 GB/s |

|PC800 |RDRAM Dual |400 MHz |2 x 16 Bit |3.2 GB/s |

|PC1066 |RDRAM Dual |533 MHz |2 x 16 Bit |4.2 GB/s |

|PC1200 |RDRAM Dual |600 MHz |2 x 16 Bit |4.8 GB/s |

|PC800 |RDRAM Dual |400 MHz |2 x 32 Bit |6.4 GB/s |

|PC1066 |RDRAM Dual |533 MHz |2 x 32 Bit |8.4 GB/s |

How Memory Access Works

Information is stored by first separating the memory area into rows and columns. The capacity of the individual chips determines the number of rows and columns per module. When several arrays are combined, they create memory banks.

The chips are actually accessed by means of control signals such as row address strobe (RAS), column address strobe (CAS), write enable (WE), chip select (CS) and several additional commands (DQ). You also need to know something about which row is active in the memory matrix at any given moment.

In today's computers, a command rate is defined in BIOS - generally 1-2 cycles. This describes the amount of time it takes for the RAS to be executed after the memory chip has been selected.

The memory controller selects the active row. But before the row will actually become active so that the columns can be accessed, the controller has to wait for 2-3 cycles - tRCD (RAS-to-CAS delay). Then it sends the actual read command, which is also followed by a delay - the CAS latency. For DDR RAM, CAS latency is 2, 2.5 or 3 cycles. Once this time has lapsed, the data will be sent to the DQ pins. After the data has been retrieved, the controller has to deactivate the row again, which is done within tRP (RAS precharge time).

There is one more technical restriction - tRAS (active-to-precharge delay). This is the fewest number of cycles that a row has to be active before it can be deactivated again. 5-8 cycles are about average for tRAS.

Memory timings are generally cited in order of importance:

[pic]

Conclusion - Consider Carefully

In most of the disciplines, you can see that it no longer matters as much what memory timings you have as it did only a few years ago, when SDRAM or the first DDR generation were still hot. Or, to put it another way, having faster or slower RAM will not tip the balance in favor of or against the latest AMD and Intel processors.

We observed one interesting result in many of the gaming benchmarks: while the Pentium 4 3.2 GHz is normally just a touch faster than the Athlon 64 3200+, it quickly falls behind the Athlon if you only use slow memory modules.

Things start getting untidy when you combine compute-intensive tasks with large quantities of data such as file compression. In such categories, the memory timings make or break performance - the Pentium 4 processors either take the lead or bring up the rear, depending on whether the memory timings are fast or slow. We were duly impressed by the Athlon 64 FX-51's scores, which maintained its ranking no matter what kind of memory it was given. This steadfastness is largely due to the integrated memory controller.

The moral of the story is clear: while we still recommend buying brand-name products to ensure compatibility (especially for dual-channel systems), but you don't necessarily need the fastest timings. In today's market, you only need fast modules if your computer will be computing a lot or encoding video. For any other application, slower RAM will definitely cut the mustard.

DDR2 Memory Tutorial

Introduction

|DDR2 memories are already supported on high-end motherboards. We compiled below a short list with the main differences between DDR2 and DDR memories. |

|DDR memories are officially found in 266 MHz, 333 MHz and 400 MHz versions, while DDR2 memories are found in 400 MHz, 533 MHz, 667 MHz and 800 MHz versions. Both |

|types transfer two data per clock cycle. Because of that the listed clocks are nominal clocks, not real ones. To get the real clock divide the nominal clock by |

|two. For example, DDR2-667 memory works in fact at 333 MHz. |

|DDR2 memories have a lower power consumption compared to DDR memories. |

|DDR memories are fed with 2.5 V while DDR2 memories are fed with 1.8 V. |

|On DDR memories the resistive termination necessary for making the memory work is located on the motherboard, while on DDR2 memories this circuit is located |

|inside the memory chip. This is one of the reasons why it is not possible to install DDR2 memories on DDR sockets and vice-versa. |

|DDR modules have 184 contacts, while DDR2 modules have 240 contacts. |

|On DDR memories the “CAS Latency” (CL) parameter – which is the time the memory delays delivering a requested data –, can be of 2, 2.5 or 3 clock cycles. On DDR2 |

|memories CL can be of 3, 4 or 5 clock cycles. |

|On DDR2 memories, depending on the chip, there is an additional latency (AL) of 0, 1, 2, 3, 4 or 5 clock cycles. So in a DDR2 memory with CL4 and AL1 the latency |

|is 5. |

|On DDR2 memories the write latency equals to the read latency (CL + AL) minus 1. |

|Internally the controller inside DDR memories works preloading two data bits from the storage area (task known as “prefetch”) while the controller inside DDR2 |

|memories works loading four bits in advance. |

|These are the main differences between DDR and DDR2. We will explore them a little bit more on the following pages. For a more detailed explanation we recommend |

|you to read the following document: . |

Physical Aspect

|DDR and DDR2 modules have the same physical size, but DDR modules have 184 contacts, while DDR2 modules have 240. On Figure 1 you can compare the difference |

|between DDR2 and DDR edge contacts. |

|[pic] |

|click to enlarge |

|Figure 1: Differences on edge contact between DDR and DDR2 modules. |

|Thus there is no way to install a DDR2 module on a DDR socket and vice-versa. |

|All DDR2 chips use BGA (Ball Grid Array) packaging, while DDR chips almost always use TSOP (Thin Small-Outline Package) packaging. There are DDR chips with BGA |

|packaging on the market (like the ones from Kingmax), but they are not so common. On Figure 2 you can see how a TSOP chip on a DDR module looks like while on |

|Figure 3 you can see how a BGA chip on a DDR2 looks like. |

|[pic] |

|click to enlarge |

|Figure 2: DDR chips almost always use TSOP packaging. |

|[pic] |

|click to enlarge |

|Figure 3: DDR2 chips use BGA packaging. |

Resistive Termination

|On DDR modules the necessary resistive termination is located on the motherboard, while on DDR2 modules this termination is located inside the memory chips – |

|technique called ODT, On-Die Termination. |

|This is done in order to make the signal “cleaner”. On Figure 4 you can see the signal that reaches the memory chip. On the left hand side you see the signals on |

|a system that uses motherboard termination (DDR memories) while on the right hand side you see the signals on a system that uses on-die termination (DDR2 |

|memories). Even a layman can easily say that the signals on the right hand side are cleaner and stable than the signals on the left hand side. On the yellow |

|square you can compare the time frame difference – this time frame is the time the memory has to read or write a piece of data. With the use of on-die termination|

|this time frame got wider, allowing higher clocks to be achieved since the memory has more time to read or write a data chunk. |

|[pic] |

|click to enlarge |

|Figure 4: Comparison between motherboard termination and on-die termination. |

| |

| |

| |

| |

| |

| |

| |

| |

| |

| |

| |

| |

| |

| |

DDR2 SDRAM

From Wikipedia, the free encyclopedia

DDR2 SDRAM or double-data-rate two synchronous dynamic random access memory is a random access memory technology used for high speed storage of the working data of a computer or other digital electronic device.

It is a part of the SDRAM (synchronous dynamic random access memory) family of technologies, which is one of many DRAM (dynamic random access memory) implementations, and is an evolutionary improvement over its predecessor, DDR SDRAM.

Its primary benefit is the ability to run its bus at twice the speed of the memory cells it contains, thus enabling faster bus speeds and higher peak throughputs than earlier technologies. This is achieved at the cost of higher latency.

Like all SDRAM implementations, DDR2 stores memory in memory cells that are activated with the use of a clock signal to synchronize their operation with an external data bus. Like DDR before it, DDR2 cells transfer data both on the rising and falling edge of the clock (a technique called double pumping). The key difference between DDR and DDR2 is that in DDR2 the bus is clocked at twice the speed of the memory cells, so four words of data can be transferred per memory cell cycle. Thus, without speeding up the memory cells themselves, DDR2 can effectively operate at twice the bus speed of DDR.

However, latency is greatly increased as a trade-off. While DDR SDRAM has typical read latencies of between 2 and 3 bus cycles, DDR2 may have read latencies between 3 and 9 cycles. Because of this higher latency, DDR SDRAM running at the same bus speed as DDR2 is generally considered superior; DDR2 is, however, able to run at substantially higher bus speeds which equates to an overall increase in throughput.

Chips and modules

For use in PCs, DDR2 SDRAM is supplied in DIMMs with 240 pins and a single locating notch. DIMMs are identified by their peak transfer capacity (often called bandwidth).

|Standard name |Memory clock |Time between signals |I/O Bus clock |Data transfers per second |Module name |Peak transfer rate |

|DDR2-533 |133 MHz |7.5 ns |266 MHz |533 Million |PC2-4200 |4.264 GB/s |

|DDR2-667 |166 MHz |6 ns |333 MHz |667 Million |PC2-5300 |5.336 GB/s |

|DDR2-800 |200 MHz |5 ns |400 MHz |800 Million |PC2-6400 |6.400 GB/s |

|DDR2-1066 |266 MHz |3.75 ns |533 MHz |1066 Million |PC2-8500 |8.500 GB/s |

Note: DDR2-xxx (or DDR-xxx) denotes effective clockspeed, whereas PC2-xxxx (or PC-xxxx) denotes theoretical bandwidth (though it is often rounded up or down). Bandwidth is calculated by taking transfers per second and multiplying by eight. This is because DDR2 memory modules transfer data on a bus that is 64 data bits wide, and since a byte comprises 8 bits, this equates to 8 bytes of data per transfer.

1 Some manufacturers label their DDR2-667 sticks as PC2-5400 instead of PC2-5300. At least one manufacturer has reported this reflects successful testing at a faster-than standard speed.

Debut

DDR2 was introduced in the second quarter of 2003 at two initial speeds: 200 MHz (referred to as PC2-3200) and 266 MHz (PC2-4200). Both performed worse than the original DDR specification due to higher latency, which made total access times longer. However, the original DDR technology tops out at speeds around 266 MHz (533 MHz effective). Faster DDR chips exist, but JEDEC has stated that they will not be standardized. These modules are mostly manufacturer optimizations of highest-yielding chips, drawing significantly more power than slower-clocked modules, and usually do not offer much, if any, greater real-world performance.

DDR2 started to become competitive with the older DDR standard by the end of 2004, as modules with lower latencies became available.

Backwards compatibility

DDR2 DIMMs are not backwards compatible with DDR DIMMs. The notch on DDR2 DIMMs is in a different position than DDR DIMMs, and the pin density is slightly higher than DDR DIMMs. DDR2 is a 240-pin module, DDR is a 184-pin module.

Faster DDR2 DIMMs though are compatible with slower DDR2 DIMMs. The memory would just run at the slower speed. Using slower DDR2 memory in a system capable of higher speeds results in the bus running at the speed of the slowest memory in use.

DDR3 SDRAM

From Wikipedia, the free encyclopedia

In computing, DDR3 SDRAM or double-data-rate three synchronous dynamic random access memory is a random access memory interface technology used for high bandwidth storage of the working data of a computer or other digital electronic devices. DDR3 is part of the SDRAM family of technologies and is one of the many DRAM (dynamic random access memory) implementations.

DDR3 SDRAM is an improvement over its predecessor, DDR2 SDRAM, and the two are not compatible. The primary benefit of DDR3 is the ability to transfer at twice the data rate of DDR2 (I/O at 8× the data rate of the memory cells it contains), thus enabling higher bus rates and higher peak rates than earlier memory technologies. In addition, the DDR3 standard allows for chip capacities of 512 megabits to 8 gigabits, effectively enabling a maximum memory module size of 16 gigabytes.

With data being transferred 64 bits at a time per memory module, DDR3 SDRAM gives a transfer rate of (memory clock rate) × 4 (for bus clock multiplier) × 2 (for data rate) × 64 (number of bits transferred) / 8 (number of bits/byte). Thus with a memory clock frequency of 100 MHz, DDR3 SDRAM gives a maximum transfer rate of 6400 MB/s.

DDR3 is a DRAM interface specification; the actual DRAM arrays that store the data are the same as in any other type of DRAM, and have similar performance.

DDR, DDR2 and DDR3 for Desktop PCs

DDR3 memory provides a reduction in power consumption of 30% compared to DDR2 modules due to DDR3's 1.5 V supply voltage, compared to DDR2's 1.8 V or DDR's 2.5 V. The 1.5 V supply voltage works well with the 90 nanometer fabrication technology used in the original DDR3 chips. Some manufacturers further propose using "dual-gate" transistors to reduce leakage of current.

The maximum recommended voltage is 1.575 volts and should be considered the absolute maximum when memory stability is the foremost consideration, such as in servers or other mission critical devices. In addition, JEDEC states that memory modules must withstand up to 1.975 volts before incurring permanent damage, although they are not required to function correctly at that level.

The main benefit of DDR3 comes from the higher bandwidth made possible by DDR3's 8-burst-deep prefetch buffer, in contrast to DDR2's 4-burst-deep or DDR's 2-burst-deep prefetch buffer.

DDR3 modules can transfer data at a rate of 800–2133 MT/s using both rising and falling edges of a 400–1066 MHz I/O clock. Sometimes, a vendor may misleadingly advertise the I/O clock rate by labeling the MT/s as MHz. The MT/s is normally twice that of MHz by double sampling, one on the rising clock edge, and the other, on the falling. In comparison, DDR2's current range of data transfer rates is 400–1066 MT/s using a 200–533 MHz I/O clock, and DDR's range is 200–400 MT/s based on a 100–200 MHz I/O clock. High-performance graphics was an initial driver of such bandwidth requirements, where high bandwidth data transfer between framebuffers is required.

DDR3 prototypes were announced in early 2005. Products in the form of motherboards appeared on the market in June 2007. The Intel Core i7, released in November 2008, connects directly to memory rather than via a chipset. The Core i7 supports only DDR3. AMD's first socket AM3 Phenom II X4 processors, released in February 2009, were their first to support DDR3.

DDR3 DIMMs have 240 pins, are electrically incompatible with DDR2 and have a different key notch location. DDR3 SO-DIMMs have 204 pins.

GDDR3 memory, having a similar name but being from an entirely dissimilar technology, has been in use for graphic cards. GDDR3 has sometimes been incorrectly referred to as "DDR3".

Latencies

While the typical latencies for a DDR2 device were 5-5-5-15, the standard latencies for the DDR3 devices are 7-7-7-20 for DDR3-1066 and 7-7-7-24 for DDR3-1333.

DDR3 latencies are numerically higher because the I/O bus clock cycles by which they are measured are shorter; the actual time interval is similar to DDR2 latencies (around 10 ns). There is some improvement because DDR3 generally uses more recent manufacturing processes, but this is not directly caused by the change to DDR3.

As with earlier memory generations, faster DDR3 memory became available after the release of the initial versions. DDR3-2000 memory with 9-9-9-28 latency (9 ns) was available in time to coincide with the Intel Core i7 release. CAS latency of 9 at 1000 MHz (DDR3-2000) is 9 ns, while CAS latency of 7 at 667 MHz (DDR3-1333) is 10.5 ns.

Example:

(CAS ÷ Frequency (MHz)) × 1000 = X ns

(7 ÷ 667) × 1000 = 10.4948 ns

Feature summary

DDR3 SDRAM components

• Introduction of asynchronous RESET pin

• Support of system-level flight-time compensation

• On-DIMM mirror-friendly DRAM pinout

• Introduction of CWL (CAS write latency) per clock bin

• On-die I/O calibration engine

• READ and WRITE calibration

DDR3 modules

• Fly-by command/address/control bus with on-DIMM termination

• High-precision calibration resistors

• Are not backwards compatible—DDR3 modules do not fit into DDR2 sockets; forcing them can damage the DIMM and/or the motherboard

Technological advantages compared to DDR2

• Higher bandwidth performance, up to 2133 MT/s standardized

• Slightly improved latencies as measured in nanoseconds

• Higher performance at low power (longer battery life in laptops)

• Enhanced low-power feature

Market penetration

Although DDR3 was launched in 2007, DDR3 sales are not expected to overtake DDR2 until the end of 2009, or possibly early 2010, according to Intel strategist Carlos Weissenberg, speaking during the early part of their roll-out in August 2008 (the same view had been stated by market intelligence company DRAMeXchange over a year earlier in April 2007.) The primary driving force behind the increased usage of DDR3 has been new Core i7 processors from Intel and Phenom II processors from AMD, both of which have internal memory controllers: the latter recommends DDR3, the former requires it. IDC stated in January 2009 that DDR3 sales will account for 29 percent of the total DRAM units sold in 2009, rising to 72% by 2011.

Successor

Main article: SDRAM#DDR4 SDRAM (proposed)

It was revealed at the Intel Developer Forum in San Francisco 2008 that the successor to DDR3 will be known as DDR4. It is currently in the design stage, and is expected to be released in 2012. When released, it is expected to run at 1.2 volts or less, versus the 1.5 volts of DDR3 chips and have in excess of 2 billion data transfers per second.

Prefetch buffer

From Wikipedia, the free encyclopedia

A prefetch buffer is a data buffer employed on modern DRAM chips that allows quick and easy access to multiple datawords located on a common physical row in the memory.

The prefetch buffer takes advantage of the specific characteristics of memory accesses to a DRAM. Typical DRAM memory operations involve three phases (bitline precharge, row access, column access). Row access is the heart of a read operation as it involves the careful sensing of the tiny signals in DRAM memory cells -- this is the long and slow phase of memory operation. However once a row is read, subsequent column accesses to that same row can be very quick, as the sense amplifiers also act as latches. For reference, a row of a 1Gb DDR3 device is 2,048 bits wide, so that internally 2,048 bits are read into 2,048 separate sense amplifiers during the row access phase. Row accesses might take 50 ns depending on the speed of the DRAM, whereas column accesses off an open row are less than 10 ns.

Traditional DRAM architectures have long supported fast column access to these bits on an open row. For an 8 bit wide memory chip with a 2,048 bit wide row, accesses to any of the 256 datawords (2048/8) on the row can be very quick, provided no intervening accesses to other rows occur.

The drawback of the older fast column access method was that a new column address had to be sent for each additional dataword on the row. The address bus had to operate at the same frequency as the data bus. A prefetch buffer simplifies this process by allowing a single address request to result in multiple data words.

In a prefetch buffer architecture, when a memory access occurs to a row the buffer grabs a set of adjacent datawords on the row and reads them out ("bursts" them) in rapid-fire sequence on the IO pins, without the need for individual column address requests. This assumes the CPU wants adjacent datawords in memory which in practice is very often the case. For instance when a 64 bit CPU accesses a 16 bit wide DRAM chip, it will need 4 adjacent 16 bit datawords to make up the full 64 bits. A 4n prefetch buffer would accomplish this exactly ("n" refers to the IO width of the memory chip; it is multiplied by the burst depth "4" to give the size in bits of the full burst sequence). An 8n prefetch buffer on a 8 bit wide DRAM would also accomplish a 64 bit transfer.

The prefetch buffer depth can also be thought of as the ratio between the core memory frequency and the IO frequency. In an 8n prefetch architecture (such as DDR3), the IOs will operate 8 times faster than the memory core (each memory access results in a burst of 8 datawords on the IOs). Thus a 200 MHz memory core is combined with IOs that each operate eight times faster (1600 megabits/second). If the memory has 16 IOs, the total read bandwidth would be 200 MHz x 8 datawords/access x 16 IOs = 25.6 gigabits/second (Gbps), or 3.2 gigabytes/second (GBps). Modules with multiple DRAM chips can provide correspondingly higher bandwidth.

Each generation of SDRAM has a different prefetch buffer size:

• DDR SDRAM's prefetch buffer size is 2n (two datawords per memory access)

• DDR2 SDRAM's prefetch buffer size is 4n

• DDR3 SDRAM's prefetch buffer size is 8n (eight datawords per memory access)

Increased Bandwidth

The speed of memory has not historically increased inline with CPU improvements. In order to increase the bandwidth of memory modules the prefetch buffer reads data from multiple memory chips simultaneously. This is similar to a RAID array in the storage world. Also it is similar to the concept of Dual Channel memory - but the extra channels are internal to each module. Sequential access bandwidth is markedly improved using prefetch buffers, but random access is mostly unchanged.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download