Review:%SingleNCycle%Processor%

3/5/15

CS 61C: Great Ideas in Computer Architecture (Machine Structures)

Pipelining

Instructors: Krste Asanovic & Vladimir Stojanovic hFp://inst.eecs.berkeley.edu/~cs61c/

Review: Single--Cycle Processor

? Five steps to design a processor:

1. Analyze instrucQon set ?

Processor

datapath requirements

Control

2. Select set of datapath

components & establish

clock methodology

Datapath

Input Memory

Output

3. Assemble datapath meeQng

the requirements

4. Analyze implementaQon of each instrucQon to determine seVng of control points that effects the register transfer.

5. Assemble the control logic

? Formulate Logic EquaQons ? Design Circuits

Adder

PC Ext

4

imm16

Adder

Review: A Single--Cycle Datapath

Inst

Memory

Instruction

Adr

nPC_sel

Rs

Rt

Rd

Imm16

RegDst

Rd

Rt

Equal

1

0

ALUctr

MemtoReg

MemWr

PC

Mux

RegWr Rs

Rt

5

5

5

00

ALU

Extender

busW

Rw

Ra

Rb

busA

32

=

32

32

RegFile

busB

0

0

clk

32

32

WrEn

Adr

clk

imm16

16

1

32

Data In

Data

Memory

1

clk

ExtOp ALUSrc

Single Cycle Performance

? Assume Qme for acQons are

? 100ps for register read or write; 200ps for other events

? Clock period is?

Instr

Instr fetch Register ALU op Memory Register Total time

read

access write

lw

200ps 100 ps 200ps 200ps 100 ps 800ps

sw

200ps 100 ps 200ps 200ps

700ps

R-format 200ps 100 ps 200ps

100 ps 600ps

beq

200ps 100 ps 200ps

500ps

? Clock rate (cycles/second = Hz) = 1/Period (seconds/cycle) ? What can we do to improve clock rate? ? Will this improve performance as well?

Want increased clock rate to mean faster programs

Single Cycle Performance

? Assume Qme for acQons are

? 100ps for register read or write; 200ps for other events

? Clock period is?

Instr

Instr fetch Register ALU op Memory Register Total time

read

access write

lw

200ps 100 ps 200ps 200ps 100 ps 800ps

sw

200ps 100 ps 200ps 200ps

700ps

R-format 200ps 100 ps 200ps

100 ps 600ps

beq

200ps 100 ps 200ps

500ps

? What can we do to improve clock rate? ? Will this improve performance as well?

Want increased clock rate to mean faster programs

GoFa Do Laundry

? Ann, Brian, Cathy, Dave

each have one load of clothes to A B C D wash, dry, fold, and put away

? Washer takes 30 minutes

? Dryer takes 30 minutes

? "Folder" takes 30 minutes

? "Stasher" takes 30 minutes to put clothes into drawers

1

3/5/15

SequenQal Laundry

6 PM 7 8 9 10 11 12 1 2 AM

T

30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30

a A

Time

s

k B

C O r D d e

r ? SequenQal laundry takes

8 hours for 4 loads

Pipelined Laundry

6 PM 7 8 9 10 11 12

T 3030 30 30 30 30 30

Time

a A

s k B

C

O r

D

d

e

r ? Pipelined laundry takes

3.5 hours for 4 loads!

1 2 AM

Pipelining Lessons (1/2)

6 PM 7 8 9 ? Pipelining doesn't help latency

T

Time of single task, it helps throughput of enQre workload

a s k

30 30 30 30 30 30 30 A

B

? MulQple tasks operaQng simultaneously using different resources

O C r

? PotenQal speedup = Number pipe stages

d D

? Time to "fill" pipeline and Qme

e

to "drain" it reduces speedup:

r

2.3X v. 4X in this example

Pipelining Lessons (2/2)

6 PM 7 8 9

? Suppose new Washer

T

Time takes 20 minutes, new

a

30 30 30 30 30 30 30

Stasher takes 20

s A

minutes. How much

k

faster is pipeline?

B O C r

? Pipeline rate limited by slowest pipeline stage

d D

? Unbalanced lengths of

e

pipe stages reduces

r

speedup

Steps in ExecuQng MIPS

1) IFtch: InstrucQon Fetch, Increment PC

2) Dcd: InstrucQon Decode, Read Registers

3) Exec:

Mem--ref:

Calculate Address

Arith--log: Perform OperaQon

4) Mem:

Load: Read Data from Memory

Store: Write Data to Memory

5) WB: Write Data Back to Register

P C instruction

memory registers

Data memory

Single Cycle Datapath

rd

rs rt

+4

imm

ALU

1. Instruction Fetch

2. Decode/ 3. Execute 4. Memory Register Read

5. Write Back

2

Pipeline registers

P C instruction

memory registers

Data memory

rd

rs rt

+4

imm

ALU

1. Instruction Fetch

2. Decode/ 3. Execute 4. Memory Register Read

5. Write Back

? Need registers between stages

? To hold informaQon produced in previous cycle

3/5/15 More Detailed Pipeline

IF for Load, Store, ...

ID for Load, Store, ...

EX for Load

MEM for Load

3

WB for Load ? Oops!

3/5/15 Corrected Datapath for Load

Wrong register number!

Pipelined ExecuQon RepresentaQon

Time

IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB

? Every instrucQon must take same number of steps, so some stages will idle

? e.g. MEM stage for any arithmeQc instrucQon

21

Graphical Pipeline Diagrams

MUX PC instruction memory Data memory

rd rs rt

Register File

ALU

+4

imm

1. InstrucQon 2. Decode/ 3. Execute 4. Memory 5. Write

Fetch

Register Read

Back

? Use datapath figure below to represent pipeline:

IF ID EX Mem WB

I$

Reg

D$

Reg

22

ALU

ALU

ALU

ALU

ALU

ALU

Graphical Pipeline RepresentaQon

? RegFile: left half is write, right half is read

Time (clock cycles)

I

n s Load

I$

Reg

D$

Reg

t r

Add

I$

Reg

D$

Reg

O Store r Sub d e Or r

I$

Reg

D$

Reg

I$

Reg

D$

Reg

I$

Reg

D$

Reg

23

Pipelining Performance (1/3)

? Use Tc ("Qme between compleQon of instrucQons") to measure speedup

?

? Equality only achieved if stages are balanced

(i.e. take the same amount of Qme)

? If not balanced, speedup is reduced ? Speedup due to increased throughput

? Latency for each instrucQon does not decrease

24

4

3/5/15

Pipelining Performance (2/3)

? Assume Qme for stages is

? 100ps for register read or write ? 200ps for other stages

Instr

lw sw R-format beq

Instr fetch 200ps 200ps 200ps 200ps

Register read 100 ps 100 ps 100 ps 100 ps

ALU op

200ps 200ps 200ps 200ps

Memory access 200ps

200ps

Register write 100 ps

100 ps

Total time 800ps 700ps 600ps 500ps

? What is pipelined clock rate?

? Compare pipelined datapath with single--cycle datapath

25

Pipelining Performance (3/3)

Single-cycle Tc = 800 ps f = 1.25GHz

Pipelined Tc = 200 ps

f = 5GHz

26

Clicker/Peer InstrucQon

Which statement is false? ? A: Pipelining increases instrucQon throughput ? B: Pipelining increases instrucQon latency ? C: Pipelining increases clock frequency ? D: Pipelining decreases number of components

27

Administrivia

? Project 1--2 due date now 11:59PM Saturday 3/7 ? HW 4 due date now 11:59PM Tuesday 3/10 ? 10% Extra Credit for each finished by original

deadline

28

Pipelining Hazards

A hazard is a situaQon that prevents starQng the next instrucQon in the next clock cycle

1) Structural hazard

? A required resource is busy (e.g. needed in mulQple stages)

2) Data hazard

? Data dependency between instrucQons ? Need to wait for previous instrucQon to

complete its data read/write

3) Control hazard

? Flow of execuQon depends on previous instrucQon

29

1. Structural Hazards

? Conflict for use of a resource ? MIPS pipeline with a single memory?

? Load/Store requires memory access for data ? InstrucQon fetch would have to stall for that cycle

? Causes a pipeline "bubble"

? Hence, pipelined datapaths require separate instrucQon/data memories

? Separate L1 I$ and L1 D$ take care of this

30

5

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download