A. CPI - UAH

The University of Alabama in Huntsville ECE Department CPE 433/513 ?01 Fall 2000

Homework #7 Solution

7.7, 7.8, 7.11, 7.12, 7.18, 7.24, 7.27, 7.32

7.7 Here is a series of address references given as word addresses: 1, 4, 8, 5, 20,

17, 19, 56, 9, 11, 4, 43, 5, 6, 9, 17. Assuming a direct-mapped cache with 16

one-word blocks that is initially empty, label each reference in the list as a hit

or a miss and show the final contents of the cache.

16words 1block 1set = 16sets 1word 1block

Block Offset - 0 bits, Index - 4 bits

index

1 00 0001

m

4 00 0100

m

8 00 1000

m

5 00 0101

m

20 01 0100

m

17 01 0001

m

19 01 0011

m

56 11 1000

m

9 00 1001

m

11 00 1011

m

4 00 0100

m

43 10 1011

m

5 00 0101

h

6 00 0110

m

9 00 1001

h

17 01 0001

h

0 1 1, 17 2 3 19 4 4, 20, 4 5 5 6 6 7 8 8, 56 9 9 10 11 11, 43 12 13 14 15

7.8 Using the series of references given in Exercise 7.7, show the hits and misses and final cache contents for a direct-mapped cache with four-word blocks and a total size of 16 words.

16words 1block 1set = 4sets Block Offset - 2 bits, Index - 2 bits 4words 1block

tag, index, block offset 1 00 00 01 m 4 00 01 00 m 8 00 10 00 m 5 00 01 01 h 20 01 01 00 m 17 01 00 01 m 19 01 00 11 h 56 11 10 00 m

9 00 10 01 m

11 00 10 11 h

4 00 0100

m

43 10 10 11 m

5 00 01 01 h

6 00 01 10 h

9 00 10 01 m

17 01 00 01 h

0 0, 16

1, 17

2, 18

3, 19

1 4, 20, 4

5, 21, 5

6, 22, 6

7, 23, 7

2 8, 56, 8, 40, 8 9, 57, 9, 41, 9 10, 58, 10, 42, 10 11, 59, 11, 43, 11

3

Consider a memory hierarchy using one of the three organizations for main memory

shown in Figure 7.13 on page 561. Assume that the cache block size is 16 words, that

the width of organization b of the figure is four words, and that the number of banks in

organization c is four. If the main memory latency for a new access is 10 cycles and

the transfer time is 1 cycle, what are the miss penalties for these organizations?

send address

1 cycle

memory access

10 cycles

bus transfer

1 cycle

a. total = send address + 16(memory access + bus transfer) = 1 + 16(10+1) = 1 + 176 = 177 cycles

b. total = send address + 4(memory access + bus transfer) = 1 + 4 (10+1) = 1 + 44 = 45 cycles

c. total = send address + 4(memory access) + 16(bus transfer) = 1 + 4(10) + 16(1) = 57 cycles

7.11 Suppose a processor with a 16-word block size has an effective miss rate per instruction of 0.5%. Assume that the CPI without cache misses is 1.2. Using the memories described in Figure 7.13 on page 561 and Exercise 7.11, how much faster is this processor when using the wide memory than when using narrow or interleaved memories?

CPIeff = CPIperfect + CPImiss = CPIperfect + miss rate/instruction * miss penalty

a. CPIeff = 1.2 + 0.005*177 = 2.085 b. CPIeff = 1.2 + 0.005*45 = 1.425 c. CPIeff = 1.2 + 0.005*57 = 1.485

Pwide = ETn = IC CPIn CC = 2.085 = 1.46 Pnarrow ETw IC CPIw CC 1.425

Pwide = ETi = IC CPIi CC = 1.485 = 1.04 Pi ETw IC CPIw CC 1.425

7.18 You have been given 18 32 K ? 8-bit SRAMs to build an instruction cache for a processor with a 32-bit address. What is the largest size (i.e., the largest data storage area in bytes) direct-mapped instruction cache that you can build with one-word (32-bit) blocks? Show the breakdown of the address into its cache access components (for an example, see Figure 7.8(and describe how the various SRAM chips will be used. (Hint: You may not need all of them.)

Some chips will store tags, some will store data.

Byte offset is 2 bits, Block offset is 0 bits, Index n bits, Tag 32 ? n ? 2

It takes 4 chips to get a word, and we have increments of 32 Kwords. For 32 Kwords, 15 bits of index are required, leaving 15 bits of tag. This would use up 6 chips, we could double this, but not increase it more than that. So, we have 64 Kwords (256 Kbytes) and the address is broken down as

Byte offset ? 2 bits Index ? 16 bits Tag ? 14 bits

7.24 Suppose a computer's address size is k bits (using byte addressing), the cache size is S bytes, the block size is B bytes and the cache is A-way set-associative. Assume that B is a power of two, so B =2b. Figure out what the following quantities are in terms of S, B, A, B, and k: the number of sets in the cache, the number of index bits in the address and the number of bits needed to implement the cache.

# ofsets =# ofbytes ? 1block ? 1set = S # bytes # blocks BA

#

ofindexbits

=

log

2

S AB

=

log 2

S A

-

b

#

oftagbits

=

k

-

log 2

S A

-

b

-

b

=

k

-

log 2

S A

# bits

=#entries (tag

+

data) =

S BA

k

-

log

2

S A

+

AB

7.27 Consider three machines with different cache configurations: Cache 1: Direct-mapped with one-word blocks Cache 2: Direct-mapped with four-word blocks Cache 3: Two-way set associative with four-word blocks

The following miss rate measurements have been made: Cache 1: Instruction miss rate is 4%, data miss rate is 8% Cache 2: Instruction miss rate is 2%, data miss rate is 5% Cache 3: Instruction miss rate is 2%, data miss rate is 4%

For these machines, one-half of the instructions contain a data reference. Assume that the cache miss penalty is 6 + Block size in words. The CPI for this workload was measured on a machine with cache 1 and was found to be 2.0. Determine which machine spends the most cycles on cache misses.

Miss Cycles = #misses * miss penalty = IC * (instruction miss rate + 0.5 data miss rate) * miss penalty

Miss CyclesCache1 Miss CyclesCache2 Miss CyclesCache3

= IC*(4% + 0.5*8%)*(6+1) = IC*0.56 = IC*(2% + 0.5*5%)*(6+4) = IC*0.45 = IC*(2% + 0.5*4%)*(6+4) = IC*0.4

Cache 1 spends the most time on cache misses.

7.32 Consider a virtual memory system with the following properties: 40-bit virtual byte address 16 Kbyte pages 36-bit physical address

What is the total size of the page table for each process on this machine, assuming that the valid, protection, dirty, and use bits take a total of 4 bits and that all the virtual pages are in use? (Assume that disk addresses are not stored in the page table.

# of virtual pages =

2 40 214

= 226

#bits in physical page address = 36-14 = 22

#total bits = 226? (22+4) = 226? 26

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download