CET 479 - Sonoma State University



Linux Cluster Configuration using MPICH-1.2.1

May 12, 2006

The Message Passing Interface (MPI) is a standard that is set for vendors that creates parallel computing software. The MPICH implementation has been designed to provide an implementation of MPI library that offers both high-performance and portability. MPICH is free and the current version is “MPICH 2”. A copy of the MPICH software can be downloaded at: It is possible to mix and match systems.

Requirements:

➢ Fedora Core 4

➢ Mpich-1.2.1 (Message Passing Interface)

➢ Two Nodes “master & slave”

➢ Network Switch ( a crossover cable can be used if you only have two nodes)

➢ Ethernet cables

Next, install a Linux distribution on each computer in your cluster (henceforth, I will call them nodes). I used Fedora Core 4 for my operating system; it has fewer problems than using the Red Hat distribution.

During the installation process, assign sensible hostnames and unique IP addresses for each node in your cluster. Usually, one node is designated as the master node (where you'll control the cluster, write and run programs, etc.) with all the other nodes used as computational slaves. I named my nodes Master and Slave to keep things simple, (using Slave01, Slave02, etc.) I used IP address 172.16.1.100 for the master node and added one for each slave node (172.16.1.101, etc.).

Fedora Core 4 Installation:

Insert the Fedora core 4 CD into the CDROM drive and just follow the default installation. I used the workstation option in my cluster. DO NOT INSTALL THE FIREWALL, if you do you will have problems with security issues later. Trust Me! After the installation process is completed; after the installation process is completed, do the following:

a) On the computer task bar click Desktop, Systems Settings, Add/Remove Application.

b) Scroll down to Legacy Network Server: select the legacy checkbox, and then click Details.

c) Under Extra Packages, select rsh-server, telnet-server, and rusers-server next click close, then Update and follow the on-screen instructions.

d) Ping the nodes on your cluster to test for connectivity; e.g. ping 172.16.1.101 or 172.16.1.102, etc.

Create .rhosts file in the root directories. My .rhosts file for the root user is as follows:

Note! You put the names of all the nodes on your cluster in this file.

1. Create a .rhosts file in the root directories. This file should have all the names of the nodes in your cluster. I only have two nodes on my cluster; namely Master and Slave. My .rhosts file for the root user is as follows:

Master root

Slave01 root

2. Next, create a hosts file in the /etc directory. Below is my hosts file for the master node.

172.16.1.100 Master. Master

127.0.0.1 Localhost

172.16.1.101 Slave01

172.16.1.102 Slave02, etc.

Each node in the cluster has a similar hosts file with the appropriate changes to the first line reflecting the hostname of that node. For example, Slave01 node would have the first line:

172.16.1.101 Slave01. Slave01

With the third line containing the IP and hostname of Master; all other nodes are configured in the same manner. Do not remove the 127.0.0.1 localhost line. To allow root users to use rsh, Add the following lines to the /etc/securetty file:

rsh, rlogin, pts/0, pts/1 (single space between each)

Also, modified the /etc/pam.d/rsh file: to look like the following.

#%PAM-1.0

# For root login to succeed here with pam_securetty, "rsh" must be

# listed in /etc/securetty.

auth sufficient /lib/security/pam_nologin.so

auth optional /lib/security/pam_securetty.so

auth sufficient /lib/security/pam_env.so

auth sufficient /lib/security/pam_rhosts_auth.so

account sufficient /lib/security/pam_stack.so service=system-auth

session sufficient /lib/security/pam_stack.so service=system-auth

3. Navigated to the /etc/xinetd.d directory and modified each of the command files (rsh, rlogin, telnet and rexec), changing the disabled = yes line to disabled = no. Then

4. Close the editor and issue the following command:

xinetd –restart, to enable rsh, rlogin etc

5. Download the MPICH software onto the Master node (Master). Untar the file in the root directory (if you want to run the cluster as root). Issue the command: tar zxfv mpich.tar.gz (or whatever the name of the tar file is for your version of MPICH), and the mpich-1.2.1 directory will be created with all subdirectories in place. If you are using a later version of MPICH than we are, the last number might be different than mine.

Change to the newly created mpich-1.2.1 directory. Make certain to read the README file (if it exists); at the command prompt, Type ./configure, and when the configuration is complete and you are back at the command prompt, type make

The make may take a few minutes, depending on the speed of your master computer. Once make has finished, add the mpich-1.2.1/bin and mpich-1.2.1/util directories to your PATH in .bash_profile e.g. mpich-1.2.1/util: mpich-1.2.1/bin

6. Log out and then log in to enable the modified PATH containing your MPICH directories.

7. From within the mpich-1.2.1 directory, go to the util/machines/ directory and find the machines.LINUX file. This file will contain the hostnames of all the nodes in your cluster. When you first view the file, youll notice that five copies of the hostname of the computer you are using will be in the file. For the Master node on our cluster, there will be five copies of Master in the machines.LINUX file. If you have only one computer available, leave this file unchanged, and you will be able to run MPI/MPICH programs on a single machine. Otherwise, delete the five lines and add a line for each node hostname in your cluster, with the exception of the node you are using. For my cluster, my machines.LINUX file as viewed from master looks like this:

Salve01

Slave02, etc.

8. Then make all the example files and the MPE graphic files. First, navigate to the mpich-1.2.1/examples/basic directory and type make to make all the basic example files. When this process has finished, to the mpich-1.2.1/mpe/contrib directory and make some additional MPE example files, especially if you want to view graphics. Within the mpe/contrib directory, you should see several subdirectories. The one we will be interested in is the mandel directory. Change to the mandel directory, and type make to create the pmandel exec file. You are now ready to test your cluster.

Running the Test Programs

1. From within the mpich-1.2.1/examples/basic directory, copy the cpilog exec file (if this file is not present, try typing make again) to your top-level directory. “/root”

2. Then, from your top level directory, rcp the cpilog file to each node in your cluster, placing the file in the corresponding directory on each node. For example, if I am logged as root on the master node, I'll type rcp cpilog Master:/root to copy cpilog to the /root directory on Slave01. Once the files have been copied, I'll type the following from the top directory of my master node to test my cluster:

3. mpirun –np 1 cpilog – you should get the following output

pi is approximately 3.1415926535899406,

Error is 0.0000000000001474

Process 0 is running on node00.

Wall clock time = 0.360909

Figure 1

Try to run this program using more than one node.

Mpirun –np 2 cpilog: depending on the number of nodes on your class, try using all the available nodes and note the difference in execution time. For example if I have a eight node cluster, I would execute the following command. Mpirun –np cpilog and my result should look like the following.

pi is approximately 3.1415926535899406,

Error is 0.0000000000001474

Process 0 is running on node00.

Process 1 is running on node01.

Process 2 is running on node02.

Process 3 is running on node03.

Process 4 is running on node04.

Process 5 is running on node05.

Process 6 is running on node06.

Process 7 is running on node07.

wall clock time = 0.0611228

Figure 2

Future Works:

The following tests do not work:

To see some graphics, we must run the pmandel program. Copy the pmandel exec file (from the mpich-1.2.2.3/mpe/contrib/mandel directory) to your top-level directory and then to each node (as you did for cpilog). Then, if X isn't already running, issue a startx command. From a command console, type xhost + to allow any node to use your X display, and then set your DISPLAY variable as follows: DISPLAY=node00:0 (be sure to replace node00 with the hostname of your master node). Setting the DISPLAY variable directs all graphics output to your master node. Run pmandel by typing: mpirun -np 2 pmandel

The pmandel program requires at least two processors to run correctly. You should see the Mandelbrot set rendered on your master node.

[pic]

Figure 3 he mandelbrot Set Rendered on the Master Node

You can use the mouse to draw a box and zoom into the set if you want. Adding more processors (mpirun -np 8 pmandel) should increase the rendering speed dramatically. The mandelbrot set graphic has been partitioned into small rectangles for rendering by the individual nodes. You actually can see the nodes working as the rectangles are filled in. If one node is a bit slow, then the rectangles from that node will be the last to fill in. Its fascinating to watch. We've found no graceful way to exit this program other than pressing Ctrl-C or clicking the close box in the window. You may have to do this several times to kill all the nodes. An option to pmandel is to copy the cool.points file from the original mandel directory to the same top-level directory (on the master node) as pmandel and run mpirun -np 8 pmandel -i cool.points

The -i option runs cool.points as a script and puts on a nice mandelbrot slide show. You could use the cool.points file as a model to create your own display sequence if you like.

Reference:

Step-by-Step Clustering Using MPICH 1.2.2.3:

MPICH Software downloads:

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download