Algorithms PART II: Partitioning and Divide & Conquer

[Pages:30]Algorithms PART II: Partitioning and

Divide & Conquer

HPC Fall 2012 Prof. Robert van Engelen

Overview

Partitioning strategies Divide and conquer strategies Further reading

11/14/12

HPC Fall 2012

2

Partitioning Strategies

Block partitioning of a 2D domain

Data partitioning

Perform domain decomposition to run parallel tasks on subdomains

"Scatter-compute-gather" where local computation may require communication and scatter/gather may involve computations

function f(x,y) u=g(x) v=h(y) return u+v end

Thread 1 Thread 2

u=g(x) v=h(y)

return u+v

Task partitioning

Decompose functions into independent subfunctions and execute the subfunctions in parallel

11/14/12

HPC Fall 2012

3

Partitioning Strategies

Partitioning strategy (data partitioning):

1. Break up a given problem into P subproblems 2. Solve the P subproblems concurrently 3. Collect and combine the P solutions

Embarrassingly parallel

Is a simple form of data partitioning into independent subproblems without initial work and no communication between tasks (workers)

11/14/12

HPC Fall 2012

4

Partitioning Example 1: Summation

Summation of n values X = [x1,...,xn]

1. Divide X into P equally-sized sublists Xp, p = 0,...,P-1 and distribute the Xp sublists to the P processors

2. The processors sum the local parts sp = Xp 3. Combine the local sums s = sp

Algorithms: 1. Scatter list X using a scatter-tree 2. Serial summation of parts 3. Reduce local sums

11/14/12

HPC Fall 2012

5

Partitioning Example 1: Summation

n/4

n/8

n/8

n/2 n/8

Log2(P) steps

n/4

Total amount of

data transferred:

n/8 n/2 log2(P)

scatter (divide)

time

Local summations: n/P steps

reduce (combine)

11/14/12

1

1

1

1

1 Log2(P) steps

Total amount of

1

data transferred:

P-1 1

HPC Fall 2012

6

Partitioning Example 1: Summation

Communication time

Scatter:

tcomm1 = k=1..log2(P) (tstartup + 2-kn tdata) = log2(P)tstart + n(P-1)/P tdata

Reduce:

tcomm2 = log2(P) (tstart + tdata)

Total:

tcomm = 2 log2(P)tstart + ( n(P-1)/P + log2(P) ) tdata

Computation time

Local sum:

tcomp1 = n/P

Global sum:

tcomp2 = log2(P)

Total:

tcomp = n/P + log2(P)

Speedup, assuming tstartup = 0

Sequential time: ts = n-1

Parallel time: tP = ( n(P-1)/P + log2(P) ) tdata + n/P + log2(P)

Speedup:

SP = ts/tP = O(n / (n + log(P)))

Best speedup w/o communication:

SP = O(P/log(P))

11/14/12

HPC Fall 2012

7

General M-Ary Partitioning

divide compute

Example: partitioning an image, e.g. to

compute histogram by parallel reductions

(summations to count color pixels)

Second division

First division

Third division

time

combine

11/14/12

3-level 4-ary partitioning for 43 = 64 processors

HPC Fall 2012

8

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download