A. CPI - UAH
The University of Alabama in Huntsville ECE Department CPE 433/513 ?01 Fall 2000
Homework #7 Solution
7.7, 7.8, 7.11, 7.12, 7.18, 7.24, 7.27, 7.32
7.7 Here is a series of address references given as word addresses: 1, 4, 8, 5, 20,
17, 19, 56, 9, 11, 4, 43, 5, 6, 9, 17. Assuming a direct-mapped cache with 16
one-word blocks that is initially empty, label each reference in the list as a hit
or a miss and show the final contents of the cache.
16words 1block 1set = 16sets 1word 1block
Block Offset - 0 bits, Index - 4 bits
index
1 00 0001
m
4 00 0100
m
8 00 1000
m
5 00 0101
m
20 01 0100
m
17 01 0001
m
19 01 0011
m
56 11 1000
m
9 00 1001
m
11 00 1011
m
4 00 0100
m
43 10 1011
m
5 00 0101
h
6 00 0110
m
9 00 1001
h
17 01 0001
h
0 1 1, 17 2 3 19 4 4, 20, 4 5 5 6 6 7 8 8, 56 9 9 10 11 11, 43 12 13 14 15
7.8 Using the series of references given in Exercise 7.7, show the hits and misses and final cache contents for a direct-mapped cache with four-word blocks and a total size of 16 words.
16words 1block 1set = 4sets Block Offset - 2 bits, Index - 2 bits 4words 1block
tag, index, block offset 1 00 00 01 m 4 00 01 00 m 8 00 10 00 m 5 00 01 01 h 20 01 01 00 m 17 01 00 01 m 19 01 00 11 h 56 11 10 00 m
9 00 10 01 m
11 00 10 11 h
4 00 0100
m
43 10 10 11 m
5 00 01 01 h
6 00 01 10 h
9 00 10 01 m
17 01 00 01 h
0 0, 16
1, 17
2, 18
3, 19
1 4, 20, 4
5, 21, 5
6, 22, 6
7, 23, 7
2 8, 56, 8, 40, 8 9, 57, 9, 41, 9 10, 58, 10, 42, 10 11, 59, 11, 43, 11
3
Consider a memory hierarchy using one of the three organizations for main memory
shown in Figure 7.13 on page 561. Assume that the cache block size is 16 words, that
the width of organization b of the figure is four words, and that the number of banks in
organization c is four. If the main memory latency for a new access is 10 cycles and
the transfer time is 1 cycle, what are the miss penalties for these organizations?
send address
1 cycle
memory access
10 cycles
bus transfer
1 cycle
a. total = send address + 16(memory access + bus transfer) = 1 + 16(10+1) = 1 + 176 = 177 cycles
b. total = send address + 4(memory access + bus transfer) = 1 + 4 (10+1) = 1 + 44 = 45 cycles
c. total = send address + 4(memory access) + 16(bus transfer) = 1 + 4(10) + 16(1) = 57 cycles
7.11 Suppose a processor with a 16-word block size has an effective miss rate per instruction of 0.5%. Assume that the CPI without cache misses is 1.2. Using the memories described in Figure 7.13 on page 561 and Exercise 7.11, how much faster is this processor when using the wide memory than when using narrow or interleaved memories?
CPIeff = CPIperfect + CPImiss = CPIperfect + miss rate/instruction * miss penalty
a. CPIeff = 1.2 + 0.005*177 = 2.085 b. CPIeff = 1.2 + 0.005*45 = 1.425 c. CPIeff = 1.2 + 0.005*57 = 1.485
Pwide = ETn = IC CPIn CC = 2.085 = 1.46 Pnarrow ETw IC CPIw CC 1.425
Pwide = ETi = IC CPIi CC = 1.485 = 1.04 Pi ETw IC CPIw CC 1.425
7.18 You have been given 18 32 K ? 8-bit SRAMs to build an instruction cache for a processor with a 32-bit address. What is the largest size (i.e., the largest data storage area in bytes) direct-mapped instruction cache that you can build with one-word (32-bit) blocks? Show the breakdown of the address into its cache access components (for an example, see Figure 7.8(and describe how the various SRAM chips will be used. (Hint: You may not need all of them.)
Some chips will store tags, some will store data.
Byte offset is 2 bits, Block offset is 0 bits, Index n bits, Tag 32 ? n ? 2
It takes 4 chips to get a word, and we have increments of 32 Kwords. For 32 Kwords, 15 bits of index are required, leaving 15 bits of tag. This would use up 6 chips, we could double this, but not increase it more than that. So, we have 64 Kwords (256 Kbytes) and the address is broken down as
Byte offset ? 2 bits Index ? 16 bits Tag ? 14 bits
7.24 Suppose a computer's address size is k bits (using byte addressing), the cache size is S bytes, the block size is B bytes and the cache is A-way set-associative. Assume that B is a power of two, so B =2b. Figure out what the following quantities are in terms of S, B, A, B, and k: the number of sets in the cache, the number of index bits in the address and the number of bits needed to implement the cache.
# ofsets =# ofbytes ? 1block ? 1set = S # bytes # blocks BA
#
ofindexbits
=
log
2
S AB
=
log 2
S A
-
b
#
oftagbits
=
k
-
log 2
S A
-
b
-
b
=
k
-
log 2
S A
# bits
=#entries (tag
+
data) =
S BA
k
-
log
2
S A
+
AB
7.27 Consider three machines with different cache configurations: Cache 1: Direct-mapped with one-word blocks Cache 2: Direct-mapped with four-word blocks Cache 3: Two-way set associative with four-word blocks
The following miss rate measurements have been made: Cache 1: Instruction miss rate is 4%, data miss rate is 8% Cache 2: Instruction miss rate is 2%, data miss rate is 5% Cache 3: Instruction miss rate is 2%, data miss rate is 4%
For these machines, one-half of the instructions contain a data reference. Assume that the cache miss penalty is 6 + Block size in words. The CPI for this workload was measured on a machine with cache 1 and was found to be 2.0. Determine which machine spends the most cycles on cache misses.
Miss Cycles = #misses * miss penalty = IC * (instruction miss rate + 0.5 data miss rate) * miss penalty
Miss CyclesCache1 Miss CyclesCache2 Miss CyclesCache3
= IC*(4% + 0.5*8%)*(6+1) = IC*0.56 = IC*(2% + 0.5*5%)*(6+4) = IC*0.45 = IC*(2% + 0.5*4%)*(6+4) = IC*0.4
Cache 1 spends the most time on cache misses.
7.32 Consider a virtual memory system with the following properties: 40-bit virtual byte address 16 Kbyte pages 36-bit physical address
What is the total size of the page table for each process on this machine, assuming that the valid, protection, dirty, and use bits take a total of 4 bits and that all the virtual pages are in use? (Assume that disk addresses are not stored in the page table.
# of virtual pages =
2 40 214
= 226
#bits in physical page address = 36-14 = 22
#total bits = 226? (22+4) = 226? 26
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- mc9s12hy64 reference manual covers mc9s12hy ha family
- p1010 qoriq integrated processor hardware specifications
- find the click here
- differences among powerquicc ii devices and revisions nxp
- afm analysis of tribological properties e gnecco infm
- data sheet 6es7212 1be40 0xb0
- epson fx 2170 terminal printer service repair manual
- moeller wiring manual 02 05 klockner moeller
- p1010 qoriq integrated processor hardware
- product data sheet 6es7212 1bd30 0xb0 rs components