Grid Engine User Guide .kr



Grid Engine User Guide

[pic]

Table of Contents

Introduction

Setting Up Your Environment

Grid Engine Script Generating Tool (GEST)

Basics of Using GE

• ARL MSRC Filesystems

• ARL GE Job Policies

• Simple Job script

• Submitting Jobs to GE

• Syntax for GE Job Submission

• Platform Complexes

• CPU Time Complexes for Serial Jobs

• Parallel Environemnts

• Additional Info on Job Submission for Dedicated Nodes

• Embedding qsub options in a script

• Checking the status of your jobs

More Basics of using GE (optional)

• Abaqus, MSI, Fluent and LS-Dyna License Tracking in GE

• Interactive Jobs

• GE Graphical User Interface: qmon

FAQ

• Why does my job not start?

• How do I launch MPI jobs?

• Credentials error message

• What queues can my job run in?

• poe_file does not exist Error Message?

• tcsh: Permission denied Error Message

Beyond the Basics in Using GE

• More qsub options

• Other useful GE commands

• Tar your files to reduce file unmigration times

• GE Support for Totalview Debugger for MPI Jobs on the IBM

• GE Support for Debug Jobs

• Special Feature for Parametric Study

• Serial Post Processing Work

• CTH Register for stop_now file

Advanced Use of GE

[pic]

Introduction

Grid Engine (GE) is the queuing system used at ARL on our older machines. Grid Engine is an important interface between users and the HPC machines. Users use GE to submit jobs and check on their status. Currently, one IBM SP4 (shelton), and one Linux Cluster (powell) execute under the control of GE. Powell is available only under the HPC reservation system. You can reserve dedicated nodes on powell via the web-based HPC reservation system available from our main web page.

GE's job is to run each job as soon as possible with the resources the job requires. When a user submits a job, the user specifies what it needs in terms of machine type (IBM or Linux), CPU time, memory, etc. GE will determine when and where to run the job, based on the resources requested, as well as other factors such as the priority the project has in the GE share tree, queue configuration, machine load, current memory utilization, etc.

When a user submits a job, GE sometimes has multiple machines from which to choose. Since the user will not know at submittal time which machine the job will run on, the job script must be written so that it can execute on any of the machines that match the resources requested. Since the home filesystem is NFS mounted on all machines, this is easy to do, and Simple Job Script will explain how this is done.

First click on Setting up your environment for GE to see what to add to your initialization files to use GE. Then try our new Script Generator Tool (GEST). For basic information, click on the links under "Basic Use of GE" to see the fundamentals. Once you are comfortable with the first three sections under the "Basics", you are ready to run your batch jobs under GE. If you need to run an interactive job, or you wish to use the GUI (graphical user interface), then click on the appropriate link. When you are ready to learn more about GE, click on the links under "Beyond the Basics in using GE."

Setting up your environment for GE

Execute the following command appropriate for your shell. Copy this into your .cshrc (csh or tcsh users) or .profile (Korn or Bourne shell users) so it is automatically executed each time you login.

For csh:

if (-f /usr/msrc/modules/3.1.6/init/csh ) then

source /usr/msrc/modules/3.1.6/init/csh

module load Master modules

endif

For tcsh:

if (-f /usr/msrc/modules/3.1.6/init/tcsh ) then

source /usr/msrc/modules/3.1.6/init/tcsh

module load Master modules

endif

For sh:

if [ -f /usr/msrc/modules/3.1.6/init/sh ]

then

. /usr/msrc/modules/3.1.6/init/sh

module load Master modules

fi

For ksh:

if [ -f /usr/msrc/modules/3.1.6/init/ksh ]

then

. /usr/msrc/modules/3.1.6/init/ksh

module load Master modules

fi

To avoid problems with terminal settings in a batch job, add the following to your .login file before your 'stty' command, or any command referencing the 'term' variable:

if ( $?JOB_NAME ) then

exit

endif

Once you have sourced the settings file, your PATH and MANPATH environment variables are set up to execute GE commands or to get help on GE commands. To get help with any GE command, issue the command with the -help switch or use man with the command name to see the man page.

command -help

man command

The GE commands are stored in /usr/ge/bin. After you execute the above script (either settings.csh or settings.sh), you may execute "which qsub" and verify that you are getting a qsub from /usr/ge/bin.

ARL MSRC Filesystems

The ARL MSRC has several filesystems that GE users should be aware of. Each user is given a home filesystem on disk, plus an archive filesystem on joice/bob. The archive filesystem is used for storage of files before and after execution of jobs. In the job execution script, the input files should be copied to /usr/var/tmp/user_id/... since the temp directory /usr/var/tmp has the fastest I/O. After execution completes, the output files should be copied to the archive filesystem for safekeeping. See the sample script in these Web pages for an example on how to do this.

|Attribute |Home Filesystem |Archive Filesystem |Execution Space |

|Pathname |/home |/archive |/usr/var/tmp |

|Hardware |Sun Fileservers |Sun Fileservers (joice, bob) |Local disks |

|HPC Machines Served |IBM, Linux |IBM, Linux |IBM, Linux |

|Type |Shared (NFS) |Shared (NFS) |Unique per machine |

|Purpose |Login; small file storage |Long term storage |Job Execution |

|Capacity |1 Gbyte limit per user |No space limit |Several hundred Gbytes |

|File Lifetime |No time limit |No time limit |14 days |

|File Migration |No |Yes |No |

|Pros |No delay in access, long term|Unlimited capacity, long term |Fastest access |

|Cons |Limited capacity |Delay in file demigration; slow |Limited capacity and lifetime, not |

| | |access (NFS) |backed up |

ARL GE Job Policies

ARL has a limit of 100 on the quantity of running or pending jobs a user may have. The GE share tree insures that no one user can hog system resources at the expense of other users. ARL users may run jobs as follows:

• standard project jobs up to 96 hours/process

• Challenge jobs

• debug jobs (10 minute per processor CPU limit) that start within minutes

• background jobs up to 48 CPU hours/processor

Simple Job Script

The home filesystem is NFS mounted to all HPC machines for your convenience. Although NFS makes accessing files convenient, it does incur significant overhead, which slows down file accesses. Therefore, users should run their jobs in /usr/var/tmp, which provides much better I/O performance.

While the home filesystem is accessible on all machines, /usr/var/tmp is local to each machine and is distinct (i.e. /usr/var/tmp on one machine is NOT the same as /usr/var/tmp on another machine). GE creates a temporary directory for each job, and gives it a unique name. This directory, in /usr/var/tmp, can be referenced as $TMP or $TMPDIR. Or one can be established by the user, such as /usr/var/tmp/$LOGNAME/$JOB_ID). Using either of these temporary directories, a user's GE script should do the following:

1. copy input tarfile from $HOME to /usr/var/tmp/...

2. untar tarfile to create input files

3. execute in /usr/var/tmp/...

4. tar output files

5. copy output tarfile back to $HOME

The home filesystem has slower access, but provides a permanent storage place for files. /usr/var/tmp does not support long-term storage, but provides good I/O bandwidth for execution. The directory $TMP created by GE is removed at the end of the job. To avoid this, mkdir your own subdirectory in /usr/var/tmp/$LOGNAME.

A very simple GE script is:

#!/bin/csh

set TMPD=/usr/var/tmp/$LOGNAME/$JOB_ID

echo TMPD is $TMPD

mkdir -p $TMPD

cp input_tarfile $TMPD

cd $TMPD

tar xf input_tarfile

a.out

tar cf out_tar_file output1 results/output2 results/output3

cp output_tarfile /archive/army/thompson

echo job ended on `hostname` at `date`

When your GE script begins execution, it logs in as you, and executes your initialization files (.cshrc, .profile, .login). This puts your job in your home directory to start with. The fourth and fifth lines of the script above make a subdirectory and copy the input tarfile from there to the working directory $TMPD. This will give your job a unique directory to work in. Then the job changes to that directory, executes, and finally copies the output files back to the home filesystem for safekeeping. Many other things can be done in a script, but this basic script will run on any of the machines, and will allow for good I/O performance.

The $TMP variable defined by GE includes the GE job number, and is thus unique. Since it is unique, you can run multiple GE jobs, even on one machine, without worry that the files generated by one execution will be affected by the execution of another job. This directory is removed by GE at the end of the job, so be sure to copy output files to your home filesystem at the end of your script. If you define your temporary (work) directory as /usr/var/tmp/$LOGNAME/$JOB_ID, then this directory and the files will remain after the job completes. This is helpful in debugging, but once your script works, it is advisable to use $TMP so as to not leave unneeded files in /usr/var/tmp.

Submitting Jobs to GE

Jobs are submitted to GE using the qsub(1)command or via the qmon GUI. The GE Graphical User Interface qmon, which is explained in GE GUI qmon, has the same functionality as qsub.

The job scripts that are submitted must be on the host from which you are submitting the job (the submit host). Upon job submission, the script is saved by GE. So after you qsub a script, you are free to edit it and submit it again as another job. As soon as an execution host that meets the needs of your job is ready to run your job, the script is transferred to that host by GE and executed.

There is a small machine called qmaster that controls job initiation, job accounting, and many other GE functions. All submitted jobs are processed by qmaster, and it determines which execution host will run your job. There is no relationship between the machine you submit a job on and the machine it runs on. GE's qmaster will determine when and where to run your job based on data it receives from each machine about how busy the machine is (load factor), how much memory and swap space is currently used, how many jobs of the same type are already running on that machine (as well as globally), how your priority as a user compares to other pending jobs, etc.

Jobs are submitted in two ways, depending on whether they are serial (single CPU) or parallel. For all jobs, use the -l option to specify a complex, which tells GE what machine type, how much CPU time, and how much memory your job needs. For parallel jobs, use the -pe option to specify a parallel environment, followed by the number of processors:

Syntax for GE Job Submission:

When submitting a GE job, one must specify to GE what the job requires. Normally this includes platform type, and CPU time, though other items may also be specified. For parallel jobs, a PE (parallel environment) must be specified. The platform type (IBM, or Linux) is specified by a platform complex, and the CPU time is specified by a CPU time complex. Each complex is preceded by "-l". The PE is preceded by "-pe" and followed by the quantity of processors requested.

Serial job: qsub -l platform_complex -l cpu_time_complex script

Parallel job: qsub -l platform_complex -pe pe_name num_CPU script

The list of complexes and PE's are shown below. Most complexes allow jobs up to 4 GBytes, and there are special complexes for jobs that require more memory.

For example, to submit a 24 hour, 4 GByte, serial IBM job, use the following command:

qsub -l ibmp4 -l 24hr job_script

Also, there are priority queues available on all machines. These queues require special permission. Contact the ARL Helpdesk if you wish to use the priority queues.

Note: The values in the tables below change from time to time as we tune GE. To find the current values for a queue, execute:

qconf -sq queue_name

and look for h_data (memory) and h_cpu (CPU time in seconds).

Platform Complexes

|Complex Name |Description |

|linux |to run on powell |

|ia32 |to run on powell |

|ibmp4 |to run on an IBM SP4 |

CPU Time Complexes for Serial Jobs

|Complex Name |CPU Limit |

|4hr |4 hours |

|12hr |12 hours |

|24hr |24 hours |

|48hr |48 hours |

|96hr |96 hours |

Specification of a PE (parallel environment) is required for parallel jobs. The PE is specified in the qsub command by "-pe PE_name number_proc".

For example:

qsub -pe pe_24hr 12 -l ibm_p4 script

to run a shared memory job that requires up to 24 hours/processor and 12 CPUs on shelton.

PE (Parallel Environments)

New PE (Parallel Environments) are defined for GE effective April 4, 2001 to include the IBM and the HPCMO job categories of (1) Urgent, (2) Challenge, (3) Priority, (4) Standard, and (5) Background.

|PE Category |PE Name |CPU Limit |Comments |

|Urgent |pe_urgent |custom |only for projects declared urgent by |

| | | |the HPCMP Director |

|Challenge |pe_chal_12hr |12 hour |only for Challenge projects |

|Challenge |pe_chal_24hr |24 hour |only for Challenge projects |

|Challenge |pe_chal_48hr |48 hour |only for Challenge projects |

|Challenge |pe_chal_96hr |96 hour |only for Challenge projects |

|Challenge |pe_chal_240hr |240 hour |only for Challenge projects |

|Priority |pe_pri |custom |requires approval |

|Standard |pe_4hr |4 hour |shared memory jobs |

|Standard |pe_12hr |12 hour |shared memory jobs |

|Standard |pe_24hr |24 hour |shared memory jobs |

|Standard |pe_48hr |48 hour |shared memory jobs |

|Standard |pe_96hr |96 hour |shared memory jobs |

|Standard |mpi_4hr_ibm_p4 |4 hour |IBM MPI jobs (multinode) |

|Standard |mpi_12hr_ibm_p4 |12 hour |IBM MPI jobs (multinode) |

|Standard |mpi_24hr_ibm_p4 |24 hour |IBM MPI jobs (multinode) |

|Standard |mpi_48hr_ibm_p4 |48 hour |IBM MPI jobs (multinode) |

|Standard |mpi_96hr_ibm_p4 |96 hour |IBM MPI jobs (multinode) |

|Standard |mpi_4hr_ibm_p4_dnode |4 hour |IBM MPI jobs (dedicated nodes) |

|Standard |mpi_12hr_ibm_p4_dnode |12 hour |IBM MPI jobs (dedicated nodes) |

|Standard |mpi_24hr_ibm_p4_dnode |24 hour |IBM MPI jobs (dedicated nodes) |

|Standard |mpi_48hr_ibm_p4_dnode |48 hour |IBM MPI jobs (dedicated nodes) |

|Standard |mpi_96hr_ibm_p4_dnode |96 hour |IBM MPI jobs (dedicated nodes) |

|Reservation |mpi_resv_glinux |per reservation |Linux MPI jobs |

|Reservation |mpi_resv_glinux_dnode |per reservation |Linux MPI jobs (dedicated nodes) |

|Reservation |mpi_resv_glinux_gcc |per reservation |Linux MPI jobs compiled with gcc |

|Background |pe_background |48 hours |IBM shared memory jobs |

|Background |pe_background_mpi_ibm_p4 |24 hours |IBM SP4 MPI jobs |

|Debug |pe_debug |10 min |IBM jobs |

|Debug |mpi_debug_ibm_p4 |10 min |IBM MPI debug jobs (limit 4 proc) |

|Interactive |pe_interactive |4 or 12 hours |interactive job |

The execution of qsub will return a message saying that your job has been submitted, and will give you the GE job number assigned to your job. Save this job number, since it will be used to reference your job. By default, the standard output and standard error files generated by your job have this GE job number as part of their names so that it will be easy to associate them with this particular job.

Additional Info on Job Submission for Dedicated Nodes

GE on the IBM or Linux Cluster allows the use of dedicated nodes by requesting one of the dedicated node parallel environments.

IBM SP4:

mpi_4hr_ibm_p4_dnode

mpi_12hr_ibm_p4_dnode

mpi_24hr_ibm_p4_dnode

mpi_48hr_ibm_p4_dnode

mpi_96hr_ibm_p4_dnode

Linux:

mpi_resv_glinux

mpi_resv_glinux_dnode

mpi_resv_glinux_gcc

When requesting dedicated nodes, the user must specify a multiple of the number of processors per node (32 for IBM SP4, or 2 for Linux) as the number of slots when requesting the PE on the qsub command. For example, to submit a job with a 4 hour limit to run on 4 dedicated IBM SP4 nodes, the user would specify:

qsub -pe mpi_4hr_ibm_p4_dnode 64 job_script

GE also supports several special environment variables which can be set when submitting a job to run with fewer MPI tasks than slots.

SGE_TOTAL_MPI_TASKS

the total number of MPI tasks to run in the job. The tasks will

be distributed in a round-robin fashion among the dedicated nodes

selected by GE for the job.

SGE_MPI_TASKS_PER_NODE

the number of MPI tasks to run on each dedicated node.

These environment variables can be specified on the qsub command line when submitting the job by using the qsub -v option. For example, to run 32 total MPI tasks with 64 total CPUs on 4 dedicated nodes, the user would specify:

qsub -v SGE_TOTAL_MPI_TASKS=32 -pe mpi_4hr_ibm_p4_dnode 64 job_script

To run with 'n' MPI tasks per dedicated node:

qsub -pe mpi_4hr_ibm_p4_dnode N -v SGE_MPI_TASKS_PER_NODE='n' job_script

The MPI tasks per node can also be specified in the script by including the following line in the script:

#$ -v SGE_MPI_TASKS_PER_NODE='n'

To run with 'n' total MPI tasks using N CPUs:

qsub -pe mpi_4hr_ibm_p4_dnode N -v SGE_TOTAL_MPI_TASKS='n' job_script

The total MPI task can also be specified in the script by including the following line in the script:

#$ -v SGE_TOTAL_MPI_TASKS='n'

Embedding qsub options in a script

GE supports the embedding of qsub options in the job script itself. This is done by using #$ as the first two characters in a line in your job script, followed by a valid qsub option. GE will read these options when the job is submitted and treat them as if you had specified them on the command line or in the GUI. This eliminates the need to type in the options each time a job is submitted, and it also provides a written record.

Below is an example job script which incorporates qsub -l options. If you specify resources in the script, be sure to not specify them in the qsub command, because incompatible resource requests will prevent your job from starting.

Example Script:

#!/bin/csh

#

# -- the shell to use --

#

#$ -S /bin/csh

#

# -- the name of the job ---

#

#$ -N my_job_name

#

# -- complex required by the job

#

#$ -l ibmp4

#$ -l 4hr

#

# -- merge standard out and standard error

#

#$ -j y

#

set TMPD=/usr/var/tmp/$LOGNAME/$JOB_ID

echo TMPD is $TMPD

mkdir -p $TMPD

cp input_file $TMPD

cp a.out $TMPD

cd $TMPD

a.out

cp output_file ~

echo job ended on `hostname` at `date`

More qsub Options will show you how to place qsub options in a file other than your script. This allows one file of qsub options to be shared by many scripts.

Checking the status of your jobs

Users may check the status of GE jobs using the qstat command. The qstat command has several options:

qstat the basic output

qstat -f -r to see most everything

qstat -u user_id to see jobs for one user

The qstat command without any options gives the following information for all running and pending jobs:

job number

job name

user name

job state (r=running, qw=queued and waiting, R=restarted)

submit/start date and time

queue name (if running)

The "qstat -f -r" command provides the following information for all running and pending jobs, as well as information for all queues, even those that are currently empty.

queue name (hostname_xxx.q)

queue type (B=batch, I=interactive, P=parallel, and C=checkpointable)

number of slots used, and total number of slots for the queue

machine load average

queue state (alarmed, suspended, disabled, unavailable)

job number

job name

user name

job state (r=running, qw=queued and waiting, R=restarted)

submit/start date and time

See the man page for qstat or type "qstat -help" for more information.

Abaqus, Fluent, MSI and LS-Dyna License Tracking in GE

GE now tracks licenses for LS-Dyna, Abaqus, MSI (Accelrys), and Fluent. To specify that your job needs to run on a host that has LS-Dyna and that it needs an LS-Dyna license, use:

#$ -l lsdyna

#$ -l lsdyna_lic=1

Use "abaqus" to tell Grid Engine that your job needs to run on a host that has Abaqus, and then add abaqus_lic=1 to tell SGE that you need an Abaqus license.

#$ -l abaqus

#$ -l abaqus_lic=1

To specify that your job needs an MSI license, add:

#$ -l MSI_TokenR=n

where n is the number of tokens per processor, based on functionality. For example, to run a 48 hour job with 4 processors:

#$ -l MSI_TokenR=n

#$ -pe pe_48hr 4

Set MSI_TokenR to the number of tokens needed per processor based on the MSI functions your are using. GE will assume 4 licenses are needed based on the 4 processors requested in the PE line. The total number of tokens needed is 4*n in this example.

There are two types of Fluent licenses (fluentnw and fluentpar), and each must be specified to GE separately, since both are not always required. One network license (fluentnw) is needed for each Fluent job, regardless of what Fluent component is being used. The second type, fluentpar, is used for parallel Fluent jobs only.

To specify that your job needs to run on a host with Fluent and also needs a fluentnw license, add to your script:

#$ -l fluent

#$ -l fluentnw=y

To specify that your job needs a fluentpar license, add:

#$ -l fluentpar=1

For example, to run a 48 hour job with 32 processors:

#$ -l fluenthost

#$ -l fluentnw=y

#$ -pe pe_48hr 32

#$ -l fluentpar=1

Always set fluentpar to one ("1"). GE will assume 32 licenses are needed based on the 32 processors requested in the PE line. The 'y' in "fluentnw=y" stands for 'yes'.

Interactive Jobs

Most jobs submitted to a queuing system are batch jobs, meaning that the queuing system will run them as soon as feasible, but that may be only after a wait of hours, days, or even weeks. A second type of job is allowed to run in GE, namely an "interactive" job. By declaring a job to be interactive, the user tells GE that the job must run now, or not at all. Interactive jobs are needed when the user must interact with the running job, such as typing in input, or using a graphical interface. When interactive jobs are submitted, GE will either start the job within about 30 seconds, or will tell the user that the machines are currently too busy and that he should try later.

Not all jobs that appear to be interactive jobs have to be run interactively. If a job requires input, it is usually possible to put that input into a file, and direct the job to read the file, which would then allow the job to be run as a batch job. Portions of MATLAB, and ANSYS, for example, can be run in batch mode, even though they are often used interactively.

When GE does run an interactive job, it will send an interactive xterm window to the user's workstation instead of running a script. The user then runs his script or job in the xterm window. Upon completion, the user exits the xterm window, thus ending the GE interactive job.

Users should actively use the interactive window while it is up, and should exit it when it will not be needed for a while. This is because use of an interactive job ties up a GE slot, and prevents another job from running. Users should definitely avoid leaving an interactive job open over night!

To tell the host that runs your job where to send the xterm, set your DISPLAY env variable to point to your workstation. On a GE execution host, execute:

echo $DISPLAY to see if it is already set

setenv DISPLAY workstation_name:0.0 to set it, if not already set

echo $DISPLAY to see if you set it correctly

In addition to specifying where the host is to send your xterm, you must also tell your workstation to accept the xterm that will be sent to it by the host. Since you do not yet know which host will run your interactive job, you must give permission for all candidate machines. Once you know which host is running your job, you can disable this permission for the other machines. To give permission to accept the xterm, on your workstation execute:

xhost + shelton-n01.arl.hpc.mil

Since powell and shelton have multiple nodes, each of which is a separate GE host, to insure that GE picks a node for your job that you have done an xhost on, you need to do an xhost + for all nodes on that system, which must be done on yuor LOCAL machine. It can be done as follows:

For shelton using csh:

foreach i (1 2 3 4)

xhost + shelton-n0${i}

end

For shelton using Bourne/Korn shell:

for i in 1 2 3 4

do

xhost + shelton-n0${i}

done

For powell using csh:

foreach i (0 1)

foreach j (0 1 2 3 4 5 6 7 8 9)

foreach k (0 1 2 3 4 5 6 7 8 9)

xhost + powell-n${i}${j}${k}

end

end

end

For powell using Bourne/Korn shell:

for i in 0 1

do

for j in 0 1 2 3 4 5 6 7 8 9

do

for k in 0 1 2 3 4 5 6 7 8 9

do

xhost + powell-n${i}${j}${k}

done

done

done

Then, submit the job using qsh (instead of qsub):

qsh -l 4hr ! one CPU

qsh -l 4hr -pe pe_interactive 4 ! four CPUs

or

qsh -l hostname=herman.arl.hpc.mil

The xterm should appear in about 20 seconds. If it does not, do "qstat -u user_id" and see if the job still exists. If it does not, check $DISPLAY and xhost. You may do an xhost - "other machines" once you get an xterm on one machine. If the requested queue or class of queues (i.e., complex) is overloaded, a message will appear telling the user to try again later and the job is terminated. The interactive job will not remain pending.

GE Graphical User Interface: qmon

The Graphical User Interface to GE is qmon. Before bringing it up, you must make sure your DISPLAY environment variable is set (so GE knows where to send the window) and execute xhost for the HPC machine that you start the GUI on (so that your workstation knows to accept the window sent to it).

on HPC host: setenv DISPLAY your_workstation_name:0.0

on your workstation: xhost + hostname

Start it by executing "qmon &".

You should see a "Main Control" window pop up on our workstation. It has about 15 icons. The main ones you will use are Submit and Job Control.

Click on the Submit icon on the QMON Main Control panel. Select the job script to execute by clicking on the icon next to the "Job Script" field (2nd field from the top left), selecting the job script, and clicking on "OK". Click on the "Show Details ..." button (3rd button from the bottom right). Click on the "Parallel Environment" icon (next to the Parallel Environment fields) if your job is a parallel job. On the Parallel Environment sub-dialogue, type in the name of the parallel environment and the number of processors that you want to use. You may also click on the icon in order to select one of the available parallel environments. Click on "OK" when complete. After filling in any other fields that you need on the Job Submission dialogue, click on "Submit" to submit the job to GE.

The "Batch/Interactive" toggle button in the upper right hand corner of the Submit GUI can be used to request an interactive job. One still needs to specify the queue or complex via the "Request Resources" button.

The Job Control icon may be selected and used to see the status of user jobs.

Why does my job not start?

There are numerous reasons why a job cannot start. The most basic reason is that the processors needed are not available and the job simply must wait. But sometimes the resources requested for a job cannot be satisfied at all, and the job will never be eligible to start. Examples of this include requesting a PE for one platform type, and one or more complexes for different platforms, meaning that no one machine can satisfy both requests. In addition to incompatible resource requests, a user may ask for resources that do exist, but he may not be eligible to run in such a queue. One example of this is a non-Challenge user trying to run a job in a Challenge queue.

To determine if your job is eligible to run, use the script "qjob". The syntax is simply qjob followed by the job number:

qjob 978123

The output of this command should tell you that your job submittal is OK or else tell you why your job cannot run. If there is a problem, use the command qalter to change the resource requests.

How do I launch MPI jobs?

Use sge_mpirun to launch MPI jobs on IBM or Linux Cluster machines.

echo launch mpicthgen

sge_mpirun mpicthgen i=$input

Credentials Error Message

What does the following error message mean when I submit my job?

get_cred stderr: GSS-API error initializing context: \

Miscellaneous failure

get_cred stderr: GSS-API error initializing context: \

No credentials cache file found

warning: could not get credentials

your job 48913 ("run2") has been submitted

This error message means that your job can run, but will not have valid Kerberos tickets.

GE is fully kerberized. What that means is that GE is capable of automatically renewing your Kerberos tickets for you when your jobs starts. If you have valid, forwardable, renewable Kerberos tickets at the time you submit your GE job, GE will automatically renew your tickets when it starts your job. If you do NOT have valid, forwardable, renewable Kerberos tickets at the time you submit your GE job, then you get the above error message.

If your job does not execute any Kerberized commands (such as kftp), then the lack of Kerberos tickets will have no affect.

To acquire valid Kerberos tickets for your job (and eliminate the above error message), you must request forwardable and renewable tickets when you kinit. See the man page for kinit for the proper syntax.

What queues can my job run in?

To see which queues your job is eligible to run in, based on the resources (PE's and complexes) you requested, run the qselect command. The arguments to qselect are the PE (denoted by -pe pe_name) and complexes (denoted by -l complex). Do not include the number of processors or the script name, since the command only needs the resources.

qselect -l ibmp4 -pe pe_4hr

poe_file does not exist Error Message

What does the following error message mean?

Tue May 29 11:02:12 EDT 2001 norman-node1 grd_poe[385108.1]

ERROR: PE pe_4hr required file

/usr/ge/default/spool/norman-node1/active_jobs/385108.1/poe_file

does not exist

The most likely cause of this error message is that you did not specify a shared memory (MPI) PE. GE creates a poe file for jobs requesting PE's that start with "mpi" and these are required for distributed memory jobs. If you use a shared memory PE such as pe_24hr, the poe file is not generated, thus causing an MPI job to abort with the above error message.

The solution is to change your PE specification to a distributed memory (MPI) one. Execute "qconf -spl" to see a list of valid PE's.

tcsh: Permission denied Error Message

What does the following error message mean?

tcsh: Permission denied

tcsh: Trying to start from "/home/army/thompson"

tcsh: Trying to start from "/"

The cause of these error messages is that your home directory has permission 700. To eliminate these error messages, chmod your home directory to 755. These error messages will not prevent your job from running.

More qsub Options

The GE qsub command supports numerous options. For a list, execute "qsub -help". Some of the more useful are explained below:

1. specify an alternate project

2. submit a background job

3. specify a job name

4. specify a COTS s/w package

5. make one GE job wait until another GE job completes

6. wait until a certain time to start a job

7. execute from same directory job was submitted in

8. redirect standard out, standard error, or merge them

9. to restart, NOT restart, or resume the job if the machine crashes

10. notify job before killing/suspending

11. specify a particular host

12. verify if the resources you request are valid

13. read qsub options from file

14. specify which shell to use

Specify an alternate project

qsub -P project_id script

Most ARL users have only one project, but some users have more than one. These users will have one project listed as their default, and will need to use the -P option anytime they intend a job to be run under another project. To see what your default project is, execute "qconf -suser user_name".

The safest approach is to embed the -P project_id option in the script using the #$ in columns one and two. Do this for all scripts for all projects, and then you will not need to remember which is your default project, or to do anything special about your non-default project.

Submit a background job

Serial:

IBM SP4 qsub -l background -l ibmp4 script

Parallel:

shared memory qsub -pe pe_background 16 script

IBM SP4 qsub -pe pe_background_mpi_ibm_p4 16 script

Background jobs will not consume your project allocation, but since they run at lower priority and do not start unless the machine load drops below a certain value, background jobs may wait a long time before starting. Background jobs are limited to 24 hours on shelton.

Specify a job name

qsub -N pick_any_name script

The name specified by the -N is by default part of the names of the standard output and standard error files. The name cannot start with a number. This option can be used for the user's convenience, especially if the user is doing a parametric study and wishes to have a convenient way to uniquely label each run. See Special Technique for Parametric Study.

Specify a COTS software package

qsub -l xpatch script

The following COTS s/w packages are available only on certain machines due to their licenses: xpatch, mopac and plusfort. To run a GE job that requires one of these COTS packages, specify the name (as listed above) with the resources option (-l). GE will insure that your job runs on a machine that has that COTS package. The COTS software packages are in /usr/cta. Packages other than these four are available on all machines, and nothing special is required.

Make one GE job wait until another GE job completes

qsub -hold_jid job_id script

This option allows a user to submit one GE job, and then have subsequent job(s) wait until this job completes before they are allowed to start. A job may have multiple jobs listed here, in which case all must complete (successfully) before it is allowed to start.

Wait until a certain time to start a job

qsub -a date_time script

See man page on qsub for details.

Execute from same directory job was submitted in

qsub -cwd script

This option causes GE to cd into the directory in which the job was submitted before beginning execution. It is useful when a user has distinct directories for distinct jobs. This option causes the standard output and standard error files to be placed in the qsub directory (unless overridden by the -o and -e options).

Redirect standard out ,standard error, or merge them

qsub -o out_file -e err_file script

qsub -o out_file -j y script

The -o option redirects standard output, the -e option redirects standard error, and the -j option specifies whether the standard output and standard error files are to be merged (y=yes, n=no). The default is no. The default location of the standard output and standard error files is the user's home directory, unless the -cwd option is used, in which case the default is the directory of job submission. Even though GE supports the capability to redirect these files, the default names used by GE are unique, and include the GE job name and GE job number. This makes associating these files with their particular job very easy, and we recommend that users skip the -e and -o options and allow GE to use the default names.

To restart, NOT restart, or resume the job if the machine crashes

qsub -r n script to not restart

qsub -r y script to restart (default)

By default, our GE queues are set so that any job that is running when the execution host crashes will be restarted. Unless the job is checkpointed, the job will be restarted at the beginning of the script. This is what most users want, but many do not. It does have the drawback of overwriting files that existed at the time the machine crashed. If you can manually salvage some data from a partially completed job, you may prefer to specify that you do not want GE to restart your job.

In your job script, the variable $RESTARTED will be set to 1 if the job has been restarted. This information can be used by your script to determine if it should resume processing instead of starting from scratch.

For example, suppose your executable has the ability to resume from temporary files written out during execution. Some codes are written to look for these intermediate files, and use them if they exist. To take advantage of this feature, you must tell GE that if your job restarts after a machine crash, it must restart on the SAME machine (since /usr/var/tmp is distinct on each machine). To do this, include the following line at the beginning of your script:

qalter -q $QUEUE $JOB_ID

This will have GE alter your job's requested resources to insure that a restart will occur in the same queue, and hence on the same machine (since a queue resides on one machine only).

If you want to omit the copying in of input files in the case of a restart, you may do so with:

#!/bin/csh

#

#GE script to allow resumption from intermediate files

#

qalter -q $QUEUE $JOB_ID

set TMPD=/usr/var/tmp/$LOGNAME/$JOB_ID

mkdir -p $TMPD

if ( $RESTARTED == 0) then

cp input_files $TMPD

else

echo job $JOB_ID resuming at `date` on `hostname`

endif

cd $TMPD

a.out

.........

Notify job before killing/suspending

qsub -notify script

This flag causes GE to send "warning" signals to a running job prior to sending the kill signals. See man page of qsub for more details.

Specify a particular host name

qsub -l hostname=xxx.arl.hpc.mil

GE allows a user to specify a particular host, but this is NOT recommended in general. Only for unusual and specific reasons should a user use this option. When a user specifies a certain host without a good reason, he is eliminating the chance that his job can run on any other machine, which will probably result in the job waiting longer to start.

Verify if the resources you request are valid

qsub -w v -l resource1 -pe pe_name nn -l resource2 script

This should be used anytime a user is uncertain if the resources requested can be satisfied by any queue. This option ("-w v") does NOT run the job, but instead will quickly print out a message telling the user whether the job can run (verification: found suitable queue(s)) or not (verification: no suitable queues) under the current GE configuration.

Read qsub options from file

qsub -@ optionfile

Specify which shell to use

qsub -S /bin/ksh

GE does not automatically pick the shell specified in the script.

Other Useful GE Commands

GE has commands to get status of executing jobs, delete jobs, and to show the site configuration. Some of these commands are listed below. For additional details on these commands, see the man pages, or execute the command with -help.

To show which queues meet your resource requests.

qselect -l ibmp4 -pe pe_48hr

To show jobs on a queue basis (including the status of the jobs):

qstat -f

To show requested resources of jobs:

qstat -r

To delete a job (either pending or running):

qdel job_id

To show a list of complexes:

qconf -scl

To alter the resources a job needs:

qalter -l new_complex -pe new_pe NN job_number

To show parallel environment list:

qconf -spl

To show attributes for a specific parallel environment:

qconf -sp pe_name

To show default project:

qconf -suser user_id

To show checkpointing environment list:

qconf -sckptl

To show attributes for a specific checkpointing environment:

qconf -sckpt ckpt_name

GE accepts many of the same qsub options as NQS. A comparison of a few of the more common are shown here:

|Item |NQS |GE |Comment |

|std out |-o filename |-o filename |No change from NQS |

|std error |-e filename |-e filename |No change from NQS |

|shell |-s /bin/sh |-S /bin/sh |Make s upper case |

|name |-r job_name |-N job_name |Change -r to -N |

Here is a more complete set of GE qsub options:

usage: qsub [options]

[-a date_time] request a job start time

[-A account_string] use account at host

[-c ckpt_selector] define type of checkpointing for job

[-ckpt ckpt-name] request checkpoint method

[-clear] skip previous definitions for job

[-cwd] use current working directory

[-C directive_prefix] define command prefix for job script

[-dl date_time] request a deadline initiation time

[-e path_list] specify standard error stream path(s)

[-h] place user hold on job

[-hard] consider following requests "hard"

[-help] print this help

[-hold_jid job_id_list] define job dependencies

[-j y|n] merge stdout and stderr stream of job

[-l resource_list] request the given resources

[-m mail_options] define mail notification events

[-notify] notify job before kill/suspend

[-M mail_list] notify these e-mail addresses

[-N name] specify job name

[-o path_list] specify standard output stream path(s)

[-P project_name] set job's project

[-p priority] define job's relative priority

[-pe pe-name slot_range] request slot range for parallel jobs

[-q destin_id_list] bind job to queue(s)

[-qs_args [arg [...]] -qs_end] deliver args to foreign qs

[-r y|n] define job as (not) restartable

[-soft] consider following requests as soft

[-S path_list] command interpreter to be used

[-v variable_list] export these environment variables

[-verify] do not submit just verify

[-V] export all environment variables

[-@ file] read commandline input from file

[{script|-} [script_args]]

account_string account_name

ckpt_selector `n' `s' `m' `x'

date_time [[CC]YY]MMDDhhmm[.SS]

destin_id_list queue[,queue,...]

job_id_list job_id[,job_id,...]

mail_address username[@host]

mail_list mail_address[,mail_address,...]

mail_options `e' `b' `a' `n' `s'

path_list [host:]path[,[host:]path,...]

priority -1024 - 1023

resource_list resource[=value][,resource[=value],...]

slot_range [n[-m]|[-]m] - n,m > 0

variable_list variable[=value][,variable[=value],...]

Tar your files to reduce file unmigration times

Users who have their home filesystems on john/von are subject to DMF (Data Migration Facility) file migration. When preparing a job script for submission to GE, tar the files needed for the job and have your script copy the tarfile to the working directory in /usr/var/tmp. The script can then untar the tarfile to restore the files in their proper relative directory locations. Tarring all your files into one tarfile will save you lots of time in file unmigration because it is more efficient for DMF to retrieve one large file instead of many smaller ones.

Recently we witnessed one job that copied over 800 individual files from john/von to /usr/var/tmp. The files were unmigrated one at a time because the user had "cp *". The unmigration of these 800 files took almost 11.5 hours! Had he tarred the files, only one file would have been unmigrated, and it probably would have been done in a few minutes.

GE Support for Totalview Debugger for MPI Jobs on the IBM

To run the totalview debugger on the IBM for MPI jobs, one needs to run qsh to get an interactive GE window, and then invoke totalview, giving it the GE poe_host_file. ARL has written a script to do this for you. So, perform the following two steps:

qsh -pe mpi_4hr_ibm_p4 N where N = #procs

in the qsh window:

/usr/ge/local/bin/ibm_totalview

Debug Jobs

Normally, GE does not start new jobs on a busy SGI machine, but we have a mechanism for doing so to allow GE to run debug jobs with a CPU limit of 10 minutes/processor. Debug jobs may use up to half the processors on a machine. Every ten minutes, one of the SGI machines will start debug jobs. If your job can run on any machine, it should start within ten minutes. If it can only run on certain machines, the wait may be longer, but should not exceed 20 minutes. Background jobs are normally not allowed to run in the debug queues.

To run a parallel debug job:

shared memory : qsub -l ibmp4 -pe pe_debug N script

MPI : qsub -pe mpi_debug_ibm_p4 N script

where N = number of processors, which is limited to 4 on the IBM.

IBM serial debug job: qsub -l ibmp4 -l debug script

Do not request any other CPU time limit (such as 24hr).

Special Feature for Parametric Study

GE has a feature for parametric studies, called the job array feature. The job array option allows a user to submit a single job, which will spawn multiple jobs, executable on multiple machines. Each of these separate jobs is called a task, and is assigned a unique task id via the environment variable SGE_TASK_ID. The job array option is invoked by using -t , followed by the range of indices needed. An increment is also specifiable.

For example,

qsub -t 3-999:4

will prompt GE to set SGE_TASK_ID to 3,7,11,... up to 999. The -t option can be placed inside the script using #$, like other qsub options.

Here is an example of how you can use the value of SGE_TASK_ID set by GE in your job array script.

#!/bin/csh

#$ -S /bin/csh

#$ -t 1-4

set array=(abc xyz efg hij)

set var=$array[$SGE_TASK_ID]

set input_file=xpatch.$var

...

In this example partial script, GE will run 4 tasks, and will set SGE_TASK_ID to 1,2,3, and 4 respectively. When SGE_TASK_ID is 1, the value of var is abc, and input_file is set to xpatch.abc, when SGE_TASK_ID is 2, input_file is set to xpatch.xyz, etc.

Serial Post Processing Work

To perform post processing in a serial GE job, after the parallel execution portion of your job, bracket your existing post processing commands with the qsub line below (put it just before the post processing commands), and the EOF, which should be placed right after them.

last parallel execution command

qsub -cwd -N $REQUEST.post -e $SGE_STDERR_PATH -o $SGE_STDOUT_PATH -S

$SHELL -hold_jid $JOB_ID -l hostname=`hostname` ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download