Introduction - University of Washington
The Memory Hierarchy
Introduction
Terminology
Access time
Time to access a word in memory
Specifies the read or write time
Note these may be different
The memory may be organized as
Bits, bytes, or words
Cycle time
Time from the start of one read until the next
Block size
Number of words in a block
Note this is a logical description
Bandwidth
Word transmission rate
Latency
Time to access the first of a sequence of words
Block access time
Time to access an entire block from the start of a read
DRAM
Dynamic random access memory
SRAM
Static random access memory
Semi-static RAM
The periphery is clock activated (dynamic) - thus is inactive until clocked.
Only one memory cycle is permitted per clock
Periphery circuitry must be allowed to reset after each active memory cycle for min precharge time.
No refresh is required.
SDRAM
Synchronous DRAM
SDRAM synchronizes all addresses, data, and control signals to the system clock. Allows much higher data transfer rates that asynchronous transfers
ROM
Read only memory
PROM
Programmable read only memory
EPROM
Erasable programmable read only memory
EEPROM
Electrically erasable programmable read only memory
CAS
Column address strobe
Clock in used in dynamic memories to control the input of column addresses
RAS
Row address strobe
Clock in used in dynamic memories to control the input of row addresses
Refresh
Technique used in dram or SDRAM through which data is retained in memory
Refresh time interval
Time between two refresh operations - determine by system in which memory is operating
Memory block on Von Neumann machines
Actually comprised of number of memory components
Arranged
In hierarchical manner
To co-operate with each other
Hierarchical metrics
Speed
Storage capacity
At top
Slowest largest memories
Also known as secondary memory
Also tend to be least expensive
Size
On order of 10’s to 100’s of gigabits
Latency
On the order of 10’s of ms
Bandwidth
1 MB per sec
Cost
$0.02 per MB
Devices
Tape for archival storage
High density disk drives
Bottom
Smallest fastest memories
Call this cache memory
Also tend to be most expensive
Size
On order of 100’s to 1000’s bits
Up to several M in some machines today
Latency
10-20 ns
Bandwidth
8-10 MB per sec
Cost per MB
$500.00
Devices
Registers
High speed cache
Middle
Often called primary memory
Size
On order of 100’s of M bits to 1’s of G bits
Latency
50 ns
Bandwidth
1 MB per sec
Cost
$30.00 per MB
Devices
RAM ROM
Some hard drives
Lower speed cache
Motivation
We would prefer program to execute as quickly as possible
As we’ve seen
Accessing memory takes time
Each access contributes to time required to execute instruction
Static Ram Design
A typical SRAM cell appears as follows
Observe that we have 6 transistors per cell
Two access transistors enable the cell for
Read and write
Write operation
Value written into cell by
• Applying value to bi and !bI through write / sense amplifiers
• Assert word line
• Causes new value to be written into latch
Read operation
Value read from cell by
• Precharge bi and !bi to voltage halfway between 0 and 1
• Assert the word line
• Drives bi and !bi to high and low or low and high
• Values are sensed and amplified by write / sense amplifier
Typical timing is given as
DRAM Design
A typical DRAM cell appears as follows
Observe that we have only one transistor per cell
Read and write operations use single bit line
Write operation
Value written into cell by
• Applying 0 or 1 to bi through write / sense amplifiers
• Assert word line
• Charges cap if 1 stored discharges if 0 stored
Read operation
Value read from cell by
• Precharge bi to voltage halfway between 0 and 1
• Assert the word line
• Gates signal onto bi
• Values are sensed and amplified by write / sense amplifier
• Read operation causes cap to discharge
• Sensed and amplified value placed back on bit line
Called a refresh operation
Typical timing is given as shown
Chip Organization
Independent of type of internal storage
Typical RAM chip configured as shown in following drawing
Making Things Work
Locality of Reference
Goal
Reduce number of accesses
Make each access short as possible
Utilized to much greater extent in today’s memories
Ideally would like to make all memory as fast as technology allows
Such action has associated cost
Memories near bottom are expensive
Support circuitry for such memories also expensive
Additional circuitry required
Power supplies to support
Almost all programs executed today written using procedural paradigm
If we analyze how such programs
Designed
Execute
Discover interesting phenomenon
Execution generally occurs
Sequentially
Small loops
Number of instructions
Means overall progress forward through program
Executing at much lower rate than
[pic]
Access times of fastest memory
Put another way
With respect to entire program
We are executing within a small window
That moves forward through program
This is shown as the following
Formally such phenomenon called
Locality of reference
We recognize program executing only few instructions
Within small window
Benefits
If we can keep those few instructions
In fast memory
Program will appear to be executing out of fast memory
Gain benefits of such speed
Reduced cost
Important point
Approach works provided
Area within which we are executing is in window
Method can easily be defeated with
Large loops
Branches outside window
Architecture
Let’s now look at portion of memory hierarchy
We’ll not consider
Archival storage
[pic]
ROM or CDROM
Registers
We will focus on
Hard Drive
RAM
Cache
Secondary memory
218 pages
Primary memory
210 pages
Page = 4 blocks
Block = 1 K words
Word = 4 bytes
[pic]
Caching and Virtual Memory
Can represent these as
Goal
Operate out of cache memory
When need instruction or data not in cache
Bring in from RAM
Refer to this as caching
When need instruction or data not in RAM
Bring in from hard drive
Use mechanism similar to caching
Call it virtual memory
Performance
How well we do this
Establishes effectiveness of our memory management scheme
Caching
Let’s examine cache memory and caching techniques first
Once again idea
In one sense
Take advantage of locality of reference
Instructions
Data
To minimize access time
Larger sense
Can use caching techniques
Many places to optimize performance
Internet
Bit images cached locally to improve display speed
Network file systems
Temporarily maintain local copy
To avoid having to retransfer
Based upon assumption will be using again in near future
Implementation
Caching requires certain amount of local memory
Size determines how much information can be stored locally
High Level Description
Program begins executing
Encounters needed data or instruction
Check cache
If in cache
Have a cache hit
Use
Else
Have cache miss
Must go get from somewhere else
Bring in new block of data or instructions
If room left in cache
Store block
Else
Must make room
Remove existing block
If block has been modified
Save changes
Else
Discard
Write new block in its place
Important issues
How do we know something is not in cache
Where do we go to find something if not in cache
What if not there
How do we know if room left in cache
How do we know if information in cache modified
How do we select block to replace
Detailed Implementation
Will address each question as we build a cache
Implementation scheme called direct mapped cache
First step
Design cache
Hardware
Collection of memory devices
[pic]
Memory address register
Memory data register
Words will be 32 bits
Will have 256 k word cache
Architecture
We will logically divide cache into 256 blocks
Each block will be 1 k words long
Note this is a logical division
Further note
Address increments are rounded to make simpler
Provides reasonable size piece of memory to work with
Cache will now logically appear as
28 = 256 blocks 2 = 1000 words
[pic]
A 1 k block requires 10 bits address bits
To uniquely identify each location
Recall our word is 4 bytes long
Bits A0 - A1
Identify the byte
Bits A2 - A11
Identify a word in cache
Because our cache is logically divided into 256 blocks
Need 8 bits to identify each block
We can use the actual physical address to do this
Thus will use address bits
A12 - A19
These 8 bits will give the required 256 combinations
We’ll call these the index
We do this as follows
Any block of addresses with
A12 - A19 - 0000 0000
Store in Block 0
A12 - A19 - 0000 0001
Store in Block 1
A12 - A19 - 0000 0010
Store in Block 2
etc.
A20 - A31
Not directly used to address cache
Used to distinguish
[pic]
Blocks within cache
Called a tag
Stored in tag table
Tag Table
Tag table provides last bit of information
Contains
One entry for each block in cache
Ours will contain 256 entries
One for each block
Entry contains
Bit to indicate if word within block modified
Called dirty bit
Address bits A20 - A31 of corresponding block
Bit to indicate block in cache
Summarizing
A0 - A1 Identify Byte within a Word
A2 - A11 Identify Word within a block
A12 - A19 Identify Block within cache
A20 - A31 Identify Addresses within a Block
Stored in tag table
Finding a word
To find word in cache execute simple process
Check tag table for bits A20 - A31
If present
Use bits A12 - A19 to index into cache
Use bits A2 - A11 to index into block
Use bits A0 - A1 for byte access
if WRITE operation
Set dirty bit in tag table
Modify word
else return word
else
Get block from primary memory
if block occupied
Check dirty bit
if set
write block to primary memory
write block to cache
set occupied bit
else
write block to cache
set occupied bit
Data or instruction
Most contemporary computers
Use two caches
Data
Instruction
Same principles work for both
Only extra work
Deciding which cache to use
Performance
Factors to consider in each case
With and without cache
With cache
With and without miss
Optimizing size
Affect of look ahead
Associative Cache
Alternate approach
Let block be placed anywhere in cache
Use associative search to locate
Organization now appears as
Let’s specify the following
Main memory
8 K with 8 byte blocks
Cache
2 K with 256 8 byte blocks
Tag table
256 entries
Main memory block goes anywhere in cache
Entry in tag table is main memory block number
Linear search of tag table not feasible
Let a main memory address be of the form
We find a word as follows
Problem with full associative cache
Long search time
Complexity of underlying logic
Let’s now look at scheme that combines features of
Direct mapping
Associative mapping
Called block set associative
Block Set Associative
Approach combining direct and associative mapping
Main memory organized as collection of groups
Each group comprises number of blocks
Cache memory organized as collection of sets
Containing specified number of blocks
Set number corresponds to main memory group number
Any block from group j can be placed into set j
Set is now searched associatively
Far less complex search
Dealing with smaller search space
Organization now appears as
Let’s specify the following
Main memory
8 K with 8 byte blocks
Cache
2 K with 256 8 byte blocks
Tag table
256 entries
Our addresses have the following association
We can now see how a main memory address I
Mapped to a cache address
Computation of address follows in same manner as
Direct and associative mappings
Intel Pentium
Implements separate
Data and instruction caches
Each uses 2 way block set associative scheme
Virtual Memory
Virtual memory is a scheme
Very much like caching
[pic]
Difference
Caching between primary memory and CPU
Virtual memory works between secondary and primary memories
Translates
From logical address - program
To physical address = primary memory
When information not found in cache
Primary memory checked
When information not found in primary memory
Secondary memory checked
In essence primary memory acts as cache for secondary memory
Purpose two fold
Take advantage of speed of primary memory
Create the appearance of unlimited primary memory
High Level Analysis
As we saw with cache scheme
Size of primary memory significantly smaller than secondary
Rather than blocks
Primary memory divided into pages
Would like each program to have memory space allocated when loaded
Will assume memory space is contiguous
Would like to be to place pages anywhere in primary memory
Makes addressing only slightly more complicated
Will store location of program memory in page table
Similar to tag table in cache scheme
General retrieval algorithm similar to what we’ve seen
Program begins executing
Encounters needed data or instruction
Check cache
If in cache
Have a cache hit
Use
Else
Have cache miss
Check primary memory
If in primary
Bring associated block into cache
Else
Have a page fault
Get from secondary memory
Must make room
Remove existing page
If page has been modified
Save changes
Else
Discard
Write new page in its place
Bring associated block into cache
Implementation
Design primary memory
Hardware
Collection of memory devices
Memory address register
Memory data register
Words will be 32 bits
Will have 16 M byte primary memory
Architecture
We will logically divide primary memory into 210 4k pages
Each page will be 4k words long
Each will hold 4 blocks
Each block will hold 1 K words
Note this is a logical division
Further note
Again address increments are rounded to make simpler
Provides reasonable size piece of memory to work with
[pic]
Primary memory will now logically appear as
Note j not necessarily = i + 1
We consider a
Virtual address
Primary memory address
Secondary memory address
[pic]
Assume Virtual Memory has
212 pages
Assume Secondary Memory has
218 pages ( 1 G words or 4 G bytes
Similar to cache
A2 - A11
Identifies word in page
Assume Main Memory has
210 pages – 1000 pages
Identified by bits A23 - A14
Now
A13 - A12
Identify block in page
A11 - A2
Identify word in block
A1-A0
Byte within a word
Page table
Contains one entry for each of your possible page in secondary memory
Our design will have 218 entries
Your pages can be anywhere in secondary memory
To your program they appear at
0, 1, 2…m-1
Could also have
Page tables i, j, and k etc.
Alternately use
A31 - A24 to identify one of 256 page tables
Potentially allows for upto 256 jobs in memory
When job enters system
Page table included
May have only subset of your pages in memory at any one time
A23 - A14 to identify a page within a page table
From point of view of virtual memory address
[pic]
Page number represents an offset into the page table
Valid bit to indicate if page in primary memory
Entry contains pointer to location in main memory
If not in main memory
Point to location in secondary memory
Dirty bit to indicate modified data
|VM Page Number |Page Address |Status |Location in MM or SM |
|0 |0-4095 | |0 |
|1 |4096-8191 | |2 |
|2 | | |1 |
| | | |13 |
|127 | | |123 |
| | | |7 |
|212 |…. | |55 |
Address calculation proceeds as with cache
Once we find the pages in primary memory
When program loaded
Primary memory space allocated
Amount depends upon program
Address of allocated space
Stored in page table register
Gives starting address of allocated space
To find pages
Go to page table
Add
Contents of page table register to A31 - A14 of virtual address
Gives index into page table
Address
Physical memory if there
Pointer to secondary memory otherwise
Use A13 - A12 to identify block
Page Replacement
Clearly primary memory of limited size
16 M bytes
We have the ability to address
4 G words
Although 16 M seems like lot
Do not want to restrict program to that size
To satisfy requirements will need to be able to
Load additional pages into memory
As long as space left no problem
If no space
Must remove something
Several schemes available each has advantages and disadvantages
All require checking dirty bit prior to removal
If set
Write operation necessary
Otherwise
Overwrite page
Two most common
Require time stamp on each page
LRU
Remove least recently used page
Also called FIFO
Assumes oldest page least likely to be used in future
MRU
Remove most recently used page
Also called LIFO
Assumes newest page least likely to be used in future
Random
Select and remove page at random
Easy to implement
Performance
With and without VM
With VM
With and without page fault
Optimizing size
Speeding things up - TLB
Our virtual memory page table
Contains information for 218 pages
Search process and address calculation
Can be very time consuming
Would like to improve response time
Easiest way
Use caching techniques learned earlier
Keep in memory a cache containing
Most recently used page addresses
Called Translation Look Aside Buffer - TLB
Search for a page begins in the TLB
If not found then check page table as before
Architecture
Choose 256 entries
256 most recently calculated addresses stored
Implement associative search
Keyed on virtual memory page number
Each entry must contain
Valid Bit
Indicates if entry is valid
Dirty Bit
Indicates if entry has been changed
Tag
A31 - A14 of virtual address
Physical Page
Computed address of page in primary memory
-----------------------
[pic]
[pic]
[pic]
[pic]
[pic]
[pic]
[pic]
[pic]
[pic]
[pic]
[pic]
[pic]
[pic]
[pic]
[pic]
[pic]
[pic]
[pic]
[pic]
[pic]
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- introduction university of washington
- memory organization and addressing
- college of science and engineering texas a m university
- memory technologies ucf computer science
- examples of cache memory edward bosworth
- the demand for memory is exponential amazon web services
- cs 355 computer architecture
- cs152 computer architecture and engineering
- introduction to computer architecture
Related searches
- university of washington hr jobs
- university of washington jobs listing
- university of washington human resources
- university of washington human resources dept
- university of washington baseball roster
- university of washington product management
- university of washington online mba
- university of washington printable map
- university of washington opioid taper
- university of washington opioid calculator
- university of washington program management
- university of washington graduate programs