Array Jobs

Apr 12, 2013hpc

Recently we had been given a challenge by a student who needed to analyze thousands of input files with an application but found it cumbersome to submit each input file individually.

We started looking at a "for loop" to run through the input files with the PBS directive "#PBS -v TASKS".
for ((i=0; i < TASKS; i++)); do
command < input.$i
done

.. and exporting the TASKS variable at CLI with a value equal to the number of inputs needed to be analyzed. The problem with this "for loop" was that it would run through each input at a time and not execute all of them in parallel. To get around this problem you have to add a ampersand (&) to the end of the command you are executing and a "wait" option as seen below.
for ((i=0; i < TASKS; i++)); do
command < input.$i &
done
wait

The above will execute nicely but it doesn't scale and will discover another stumbling block. Issuing 20 input files will effectively use 20 concurrent CPUs but should you submit 1000 input files you will be limited to the number of cores available on the cluster. If the cluster only has 500 cores available, it will cause the job to be rejected. Then we discovered that PBS have directives for these special cases. Enter PBS directive "#PBS -t" and "${PBS_ARRAYID}".

#PBS -N inputs
#PBS -l nodes=1:series600:ppn=1
#PBS -q UCTlong
cd /home/username/application/
./command < /home/username/application/inputs/input.${PBS_ARRAYID}

The job submission script would look something like the above. To submit to the cluster you would use " qsub -t 0-100 job.sh " with 0-100 being the range of input files with their respective extentions input.0, input.1, input.2 etc etc .... input.100 in a directory. The " -t 0-100" will parse to the PBS_ARRAYID and 100 jobs will be spawned on the cluster with some running and some being queued. To checkup on the status of a array job execute " qstat -t "

Array Jobs

UCTHPC