There is a briefing by Brandon Baker on aspects of ...

Parallel Computing Project

CSC-582-3

Dr. Nagi Mekheil

Bruce Miller

Wing Chan

David Moshier

Nilanjan Roy

12 August 2008

Introduction

Over the last several weeks we delved into parallel programming to establish a practical understanding of the tools, resources, requirements, and limitations involved. Overcoming a variety of hurdles through this effort, we establish a baseline knowledge of the subject matter, create and follow a series of steps each intended to expand upon the previous effort, all within a self-defined research, development, test and evaluation (RDT&E) environment. Finally we progress forward into areas of experimentation and practical application.

Project Plan

At a high level, our project takes a three phase approach:

1. Compare and contrast the competing technologies, coming to a conclusion regarding which one we want to address.

2. Define an appropriate and relevant RDT&E environment, including IDE.

3. Execute a development plan that is sensitive to our time constraints such that we may achieve intermediary accomplishments.

Technology Investigation

OpenMP provides thread programming model at a “high level.” The programmer makes strategic decisions to manage organize the application to optimize parallelism and, most importantly, how and when to execute sections of the software in parallel. To do this, the programmer inserts parallel directives for the compiler to interpret such as #pragma omp parallel private (variable). These directives indicate which parts of the code are to be executed in parallel, the methodology for assigning work, and may also define thread-private data. It is the compiler’s responsibility to determine the more granular details. These finer points include the creation of threads and the assignment of work to those threads. The most significant benefit to the software developer is that the same code can be compiled for both sequential and parallel execution without modifying the source.

However, there are a few notable down sides to this parallel technology. First and foremost is that the compiler does not validate the directives indicated in the source code. If a programmer opts for a particular parallel directive, the compiler blindly accepts it as the most appropriate for the instructions to follow. This places a significant burden on the software developer. In spite of the fact that OpenMP has an shallow initial learning curve, in order to optimize source code one must possess an in depth knowledge of the technology. Alternatively, the programmer is forced to take a cyclical approach to software development so that knowledge and understanding gained throughout the project may be applied to sections that were completed early in the effort.

Second, there are limitations inherent in allowing the compiler to distribute work. Synchronization must be explicitly addressed in the source code. Since the compiler is responsible for distributing work within the confines of any directives that may have been given, it is not possible to know the exact distribution of the work. This causes a need to be sensitive to when and where data must be synchronized which may need to be built into the application. Otherwise the application may generate unexpected results.

A third challenge to note is architecture of OpenMP itself. It is fundamentally a tightly knit system. This creates a challenge in that there is no opportunity to peer into the individual threads during execution for troubleshooting as one would be able to do with MPI. Similarly, the technology does not scale beyond a singularly instrumented system, making it unsuitable for grid-computing environments that are currently expanding.

The last notable and easiest limitation to overcome is the immaturity of the technology. In contrast to MPI that has an abundance of tools available for it,

• mpiP – Lightweight, Scalable MPI Profiling[1]

• MQbench – graphical mpi benchmark[2]

• MPIGrid Ping (latency/ bandwidth test)[3]

• MPIUnit (Unit test framework)[4]

• Kdevelop IDE plugin[5]

OpenMP has little to none; due to the fact that they are both expansions to the C and Fortran languages, OpenMP benefits from the same available integrated development environments (IDE) on both the windows and Linux platforms with one exception: OpenMP must be compiled with a Linux compiler such as GCC.

Comparison of OpenMP and MPI

OpenMP is limited to the core memory of an individual SMP system (or cluster) while MPI is capable of being expanded across multiple systems and heterogeneous/ distributed computer architectures. This makes MPI a grid-computing compatible technology while OpenMP is not natively applicable.

OpenMP uses a fine grain parallel methodology while MPI is coarsely parallel. With MPI, each node executes the instructions separately and the data is handled via a master-slave model. OpenMP code executes in serial until it reaches a parallel directive. The OpenMP application then maximizes its parallel capability for the designated segment. The application distributes data without the programmer’s involvement.

Performance-wise, studies have shown that MPI and OpenMP are roughly equivalent.[6] The key limitation to MPI is the relationship between communications overhead and dataflow requirements. Though OpenMP tends to execute more instructions, it will show superior performance over MPI in situations where there the computer architecture relies on fewer but higher performing processors.

Development Environment

Our development environment changed over time. For the majority of our project, we focused on a linux-based IDE:

← Linux OS RedHat Fedora Core 9 + KDE desktop

← Gcc version 4.3

← Gedit

← Make

The Eclipse IDE was used to create the initial MPI trials, but not during later developments in OpenMP. IDEs that we attempted:

← WxDev-C++ (windows only)

← Eclipse (windows, linux)

← VIM (linux)

← WS (linux)

← VIM (linux)

In our later effort to integrate OpenMP and MPI, we opted to switch to a more comfortable Windows-based environment.

← Windows Vista

← Visual Studio 2008 Enterprise Edition

← MPICH2

← Boost C++ Library

In a later section you will find details regarding the exact configuration required to make OpenMP and MPI simultaneously operable.

Software Development Plan

Our software development plan builds from the most fundamental of learning applications to experiments and practical applications, each step intended to add to our depth of knowledge on the subject matter.

1. First Steps

a. Hello World

2. Concept Refinement

a. Quicksort

b. Bubblesort

c. Mergesort

d. Matrix Mathematics

3. Experimentation

a. Dynamic parallelization

b. Nested Threading

c. MPI encapsulation

d. MPI emulation

4. Practical Application

a. Ray tracing

First Steps

GCC is natively compatible with openMP since version 4 by using the command option:

.\> gcc –openmp sourcecode.cpp compiled_name.exe

(In MS Visual Studio one would use the /openmp flag.)

By selecting an operating system of Redhat Fedora Core 9, we saved a significant amount of time. However, not wanting to skip of the process of manually installing and configuring a system to be compatible with OpenMP, we followed the clear instructions as laid out by Joe Landman.[7] This article steadily walks the reader through the process of bringing an OpenMP development environment online and continues on to provide sample source code for testing. The first of these is Hello World.

Hello World

The Hello World trials involved the following basic process:

← Compile and execute hello_world.c

← Identify threads to spawn

← Recompile hello_world.c for parallel execution

← Execute parallel application

Source Code for the parallel application is documented below. In this test, we also learn a key environmental variable that permits manual manipulation of the OpenMP execution:

export OMP_NUM_THREADS=4

The key lessons learned in this simple exercise provid a launching point to a variety of additional exploration areas.

Concept refinement

It is during this phase of learning that we decide which experiments we will attempt and, if possible, which practical application we will use as an end cap. The first of these trials is meant to continue broadening our understanding of the potential in parallel computing. Due to the documentable effort to complete them and the clearly definable functionality involved, we chose to write a few sorting algorithms in parallel. During our efforts to research and build the first few sorting algorithms, we discovered the work on quicksort had already been performed. The OMPSCR[8] project published a series of demonstration applications in 2004. Included in their effort was complete source code and documentation for the resources provided. To compile the source codes, we used the following configuration settings:

[pic]

Using the documentation proves helpful, but also identifies helps us to identify the first concepts we want to address – dynamic allocation of parallel resources. In spite of the pre-existing code for quicksort, we progress to write Quicksort, Bubblesort, and Mergesort algorithms. In comparing these three sorting algorithms and injecting the most basic of parallelization instructions, the inefficiency in a parallel bubblesort becomes readily apparent. Unlike in Quicksort and Mergesort algorithms we built, in order to improve the performance of a parallel-executed bubble sort algorithm, we must re-organize the code to intentionally permit parallel operations (see source code documented below).

Experimentation

Dynamic parallelization & nested threading

OpenMP natively supports nested threaded by setting the environment var OMP_NESTED to TRUE or calling the calling "omp_set_nested" within the code. In the following example, each active thread in the primary OpenMP application spawns an additional thread. The number of threads in OpenMP is critical in dertermining the performance of the whole application. The size of the input data is highly crucial in determining the runtime of a program. As the number of threads changes for different size of data, it is highly crucial that we control the threads at runtime.

#include

#include

#include

int printt(int j) {

int k;

k = j + 1;

printf(j);

}

int main() {

int i, chunk;

float a[100], b[100], c[100];

for (i = 0; i < 100; i++) {

a[i] = b[i] = i * 2;

}

int nthreads, tid;

chunk = 100;

#pragma omp parallel for\

shared (a,b,c,chunk) private(i)\

shedule(static,chunk)

{

for (i = 0; i < 100; i++) {

c[i] = a[i] + b[i];

}

}

for (i = 0; i < 100; i++) {

printt(c[i]);

}

}

While experimenting with nested threads, we stumbled across an important detail that caused us to struggle for a few days. In the process of using printf statements to track an application’s progress we began having difficulty compiling. Sometimes our code would compile, other times it would not. It turns out that OpenMP will generate compile errors if the printf instruction is issued within a parallel segment of code. Though we were unable to confirm it, we have surmised that this peculiarity is a result of OpenMP permitting only the master thread/ process to output data. All others are limited to data processing. After some trial and error, we determined that the solution is to use an external function call instead.

Printf Error Code

#include

#include

#include

int main() {

int i, chunk;

float a[100], b[100], c[100];

for (i = 0; i < 100; i++) {

a[i] = b[i] = i * 2;

}

int nthreads, tid;

chunk = 100;

#pragma omp parallel for\

shared (a,b,c,chunk) private(i)\

shedule(static,chunk)

{

for (i = 0; i < 100; i++) {

c[i] = a[i] + b[i];

}

}

for (i = 0; i < 100; i++) {

printf(c[i]); //problem statement

}

}

Solution

#include

#include

#include

int printt(int j) { //Solution for the error

int k;

k = j + 1;

printf(j);

}

int main() {

int i, chunk;

float a[100], b[100], c[100];

for (i = 0; i < 100; i++) {

a[i] = b[i] = i * 2;

}

int nthreads, tid;

chunk = 100;

#pragma omp parallel for\

shared (a,b,c,chunk) private(i)\

shedule(static,chunk)

{

for (i = 0; i < 100; i++) {

c[i] = a[i] + b[i];

}

}

for (i = 0; i < 100; i++) {

printt(c[i]);

}

}

MPI Encapsulation& Emulation

In a conversation defining where we saw benefits/ limitations between the two parallel technologies, we conceived of an intriguing concept: why not use both? The concept provides a unique benefit to large systems because it may leverage from MPI’s naturally grid-computing compatible design. At its most exotic, the concept presents an opportunity for modular execution and processing as a service. By accounting for MPI’s scalability and inherent communication layer, one could define a need-based spawning of services by using MPI to initialize OpenMP parallelization on a collection of distributed systems. Though it has the potential to be very complex, we propose that it would be possible to realize the benefits from both systems at the same time. Using an algorithm to determine whether to execute an algorithm in MPI or OpenMP, one may achieve a higher level of parallelism by determining the optimum parallel technology layer on the fly. To conclude the experimental phase of our project, we chose to determine how tightly we may inter weave MPI and OpenMP.

Upon succeeding fairly quickly at having a rudimentary MPI application call an OpenMP executable, we thought it prudent to attempt a tighter integration of wrapping an MPI around OpenMP code, and compiling them into one executable. There are two distinct benefits of implementing MPI wrapping OpenMP. First, the techniques for implementing MPI are more tightly controlled by the programmer. It is possible to use a variety of languages, techniques, and platforms to distribute the MPI application executable. Second, MPI is more scalable.

It is possible to avoid MPI all together by emulating MPI's private memory model in OpenMP. Using the threadprivate option in the parallelization initialization line of code, one may create temporary isolated buffers that are local to each thread.

#pragma omp threadprivate(tempBuffer, leftBuffer)

#pragma omp for schedule(static,1)

for (int i=0;i= right) return;

swap(a, left, Random(left, right));

#pragma omp parallel for

for (i = left + 1; i ................
................

In order to avoid copyright disputes, this page is only a partial summary.

To fulfill the demand for quickly locating and searching documents.

It is intelligent file search solution for home and business.

Literature Lottery

There is a briefing by Brandon Baker on aspects of ...

To fulfill the demand for quickly locating and searching documents.

Related download

Related searches

There is a briefing by Brandon Baker on aspects of ...

C cast string to float

To fulfill the demand for quickly locating and searching documents.

Related download

Related searches