GNU Parallel

GNU Parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. The typical input is a list of files, a list of hosts, a list of users, a list of URLs, or a list of tables. A job can also be a command that reads from a pipe. GNU parallel can then split the input and pipe it into commands in parallel.

https://www.gnu.org/software/parallel/

Install on your Mac OSX desktop and laptop using either brew or Macports (I know Macports works since I use it on my MacBook)

sudo port install parallel
brew install parallel

Usage

Add module
module load parallel/20170322
Example Usage
parallel echo ::: 1 2 3 ::: 4 5 6


Purpose

GNU Parallel is a tool for running a series of jobs, mostly serial jobs, in parallel. This tool is best suited for running a bunch of serial jobs that may not run for the same simulation time. For example, the most easiest way to run a series of serial jobs in parallel on n cpus within one submit script is as follows


./job_1 &
./job_2 &
...
./job_n &
wait
./job_(n+1) &
./job_(n+2) &
...
./job_2n &
wait
./job_(2n+1) &
./job_(2n+2) &
...
./job_3n &
wait

This works most efficiently when all jobs have the same or almost the same run time. If run times for jobs are unequal, then n jobs are run simultaneously and the cpus remain idle until all n jobs are completed before looping through the next n jobs. This will lead to idle time and inefficient consumption of cpu time.


GNU Parallel solves this issue by first launching n jobs. When one job completes, then the next job in sequence is started. This permits efficient use of cpu time by reducing the wait time and letting a number of small jobs to run while some cpus work on longer jobs.

parallel job_{1} ::: $(seq 1 3n)

Single Node example using a LAMMPS benchmark run

The following example is run using a interactive session. However, you should be able to run this via a SLURM script

  [2018-03-12 09:19.54] ~
  [alp514.sol](1002): interact -p test -n 36
  [2018-03-12 09:19.57] ~
  [alp514.sol-e601](1001): module load lammps
  [2018-03-12 09:20.01] ~
  [alp514.sol-e601](1002): module load parallel
  [2018-03-12 09:20.07] ~
  [alp514.sol-e601](1003): cd /share/Apps/examples/parallel/
  [2018-03-12 09:20.13] /share/Apps/examples/parallel
  [alp514.sol-e601](1004): time parallel 'srun -n 1 $(which lammps) -in in.lj -var n {1} -var x {2} -log log.lammps-{1}-{2} -sc none' ::: $(seq 5 5 100) ::: $(seq 1 6)
  
  real    4m8.378s
  user    0m1.391s
  sys     0m1.787s
  [2018-03-12 09:24.51] /share/Apps/examples/parallel
  [alp514.sol-e601](1005): time parallel 'srun -n 1 $(which lammps) -in in.lj -var n {1} -var x {2} -log log.lammps- {1}-{2} -sc none' ::: $(seq 100 -5 5) ::: $(seq 6 -1 1)
  
  real    3m47.091s
  user    0m1.391s
  sys     0m1.830s

The difference in runtime above is due to the nature of the jobs. In the first example, the longer jobs are at the end while in the second example the shorter jobs at the end. In the second case, as the longer jobs complete, the shorter ones are run and there is less waiting at the end. The actual LAMMPS input file is

  [2018-03-12 09:29.48] /share/Apps/examples/parallel
  [alp514.sol-e601](1006): cat in.lj
  # 3d Lennard-Jones melt  
  
  #variable       x index 3
  #variable       y index 3
  #variable       z index 3
  variable        t equal 100*$n
  
  variable        xx equal 20*$x
  variable        yy equal 1*$x
  variable        zz equal 1*$x
  
  units           lj
  atom_style      atomic
  
  lattice         fcc 0.8442
  region          box block 0 ${xx} 0 ${yy} 0 ${zz}
  create_box      1 box
  create_atoms    1 box
  mass            1 1.0
  
  velocity        all create 1.44 87287 loop geom
  
  pair_style      lj/cut 2.5
  pair_coeff      1 1 1.0 1.0 2.5
  
  neighbor        0.3 bin
  neigh_modify    delay 0 every 20 check no
  
  fix             1 all nve
  
  thermo          1
  
  run             $t

The actual simulations launched by parallel are (reduced the number of jobs for clarity).

  [2018-03-12 09:30.04] /share/Apps/examples/parallel
  [alp514.sol-e601](1007): parallel echo 'srun -n 1 $(which lammps) -in in.lj -var n {1} -var x {2} -log log.lammps-{1}-{2} -sc none' ::: 5 10 15 ::: 1 2 3
  srun -n 1 /share/Apps/lammps/14may16/bin/lammps -in in.lj -var n 5 -var x 1 -log log.lammps-5-1 -sc none
  srun -n 1 /share/Apps/lammps/14may16/bin/lammps -in in.lj -var n 5 -var x 2 -log log.lammps-5-2 -sc none
  srun -n 1 /share/Apps/lammps/14may16/bin/lammps -in in.lj -var n 5 -var x 3 -log log.lammps-5-3 -sc none
  srun -n 1 /share/Apps/lammps/14may16/bin/lammps -in in.lj -var n 10 -var x 1 -log log.lammps-10-1 -sc none
  srun -n 1 /share/Apps/lammps/14may16/bin/lammps -in in.lj -var n 10 -var x 2 -log log.lammps-10-2 -sc none
  srun -n 1 /share/Apps/lammps/14may16/bin/lammps -in in.lj -var n 10 -var x 3 -log log.lammps-10-3 -sc none
  srun -n 1 /share/Apps/lammps/14may16/bin/lammps -in in.lj -var n 15 -var x 1 -log log.lammps-15-1 -sc none
  srun -n 1 /share/Apps/lammps/14may16/bin/lammps -in in.lj -var n 15 -var x 2 -log log.lammps-15-2 -sc none
  srun -n 1 /share/Apps/lammps/14may16/bin/lammps -in in.lj -var n 15 -var x 3 -log log.lammps-15-3 -sc none

The actual runs can also be launched from a file. Below the LAMMPS run commands are written to a file, //run.sh// (you can use any extension or just skip the extension)

  [alp514.sol-e601](1008): parallel echo 'srun -n 1 $(which lammps) -in in.lj -var n {1} -var x {2} -log log.lammps-{1}-{2} -sc none' ::: 5 10 15 ::: 1 2 3 > run.sh

To run the jobs in parallel, you need to supply the filename as a command line argument, -a filename to parallel

  [2018-03-12 09:30.54] /share/Apps/examples/parallel
  [alp514.sol-e601](1009): parallel -a run.sh
Alternatively, you can pipe the commands to parallel (the --eta argument will show a progress bar)
  [2018-03-12 09:31.57] /share/Apps/examples/parallel
  [alp514.sol-e601](1011): cat run.sh | parallel --eta
  
  Computers / CPU cores / Max jobs to run
  1:local / 36 / 9
  
  Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
  ETA: 0s Left: 0 AVG: 0.33s  local:0/9/100%/0.4s

Multi Node example using a LAMMPS benchmark run

If you pass GNU Parallel a file with a list of nodes it will run jobs on each node.

  [2019-01-11 12:09.53] /share/Apps/examples/parallel/test
  [alp514.sol](645): interact -p engi --ntasks-per-node=36 -N 2
  [2019-01-11 12:09.54] /share/Apps/examples/parallel/test
  [alp514.sol-e608](944): module load parallel
  [2019-01-11 12:10.01] /share/Apps/examples/parallel/test
  [alp514.sol-e608](945): scontrol show hostname > nodelist.txt
  [2019-01-11 12:10.07] /share/Apps/examples/parallel/test
  [alp514.sol-e608](946): parallel --jobs 1 --sshloginfile nodelist.txt --workdir $PWD -a command.txt
  sol-e608.cc.lehigh.edu
  sol-e609.cc.lehigh.edu
  [2019-01-11 12:10.11] /share/Apps/examples/parallel/test
  [alp514.sol-e608](947): cat command.txt
  echo $HOSTNAME
  echo $HOSTNAME


Minimum Requirements

  • --jobs: how many jobs to run per node
  • --sshloginfile: name of file containing a list of nodes on which to run jobs
  • --workdir: directory where to run your job on remote nodes
  • --env: environment variable to pass on to remote nodes (see below)

SLURM provides a variable $SLURM_JOB_NODELIST that contains a list of nodes in a contracted format for e.g. sol-e[607-608] that is not useful. A useful command to extract hostnames is scontrol show hostname that you need to pass onto parallel.

Loading Modules

Multi-node jobs are a little tricky because the remote nodes do not inherit the environment from the head node, so any modules loaded by the slurm script won’t be present on the remote nodes. Also, the module command is really just a shell alias, and aliases don’t work in the non-interactive bash sessions that are created on the remote nodes. One workaround is to include this environment variable definition in your SLURM script after you have loaded your modules, but before you run GNU Parallel:

  [2019-01-11 12:14.37] /share/Apps/examples/parallel/test
  [alp514.sol-e608](957): module load parallel
  [2019-01-11 12:14.42] /share/Apps/examples/parallel/test
  [alp514.sol-e608](958): module load lammps/12dec18
  [2019-01-11 12:14.46] /share/Apps/examples/parallel/test
  [alp514.sol-e608](959): export PARALLEL="--workdir . --env PATH --env LD_LIBRARY_PATH --env LOADEDMODULES --env _LMFILES_ --env MODULE_VERSION --env MODULEPATH --env MODULEVERSION_STACK --env MODULESHOME"
  [2019-01-11 12:14.51] /share/Apps/examples/parallel/test
  [alp514.sol-e608](960): parallel --jobs 1 --sshloginfile nodelist.txt 'mpiexec -n 36 $(which lammps) -in in.lj -var n {1} -var x {2} -log log.lammps-{1}-{2} -sc none' ::: $(seq 5 5 15) ::: $(seq 1 3)
  [2019-01-11 12:26.14] /share/Apps/examples/parallel/test
  [alp514.sol-e608](961): egrep -i 'loop time' log*
  log.lammps-10-1:Loop time of 9.16174 on 36 procs for 10000 steps with 32000 atoms
  log.lammps-10-2:Loop time of 69.3107 on 36 procs for 10000 steps with 256000 atoms
  log.lammps-10-3:Loop time of 267.544 on 36 procs for 10000 steps with 864000 atoms
  log.lammps-15-1:Loop time of 13.7018 on 36 procs for 15000 steps with 32000 atoms
  log.lammps-15-2:Loop time of 106.147 on 36 procs for 15000 steps with 256000 atoms
  log.lammps-15-3:Loop time of 387.021 on 36 procs for 15000 steps with 864000 atoms
  log.lammps-5-1:Loop time of 4.54284 on 36 procs for 5000 steps with 32000 atoms
  log.lammps-5-2:Loop time of 34.637 on 36 procs for 5000 steps with 256000 atoms
  log.lammps-5-3:Loop time of 128.408 on 36 procs for 5000 steps with 864000 atoms
  [2019-01-11 12:26.23] /share/Apps/examples/parallel/test
  [alp514.sol-e608](962): ls -ltr log*
  -rw-r--r-- 1 alp514 faculty  377756 Jan 11 12:15 log.lammps-5-1
  -rw-r--r-- 1 alp514 faculty  377753 Jan 11 12:15 log.lammps-5-2
  -rw-r--r-- 1 alp514 faculty  752759 Jan 11 12:16 log.lammps-10-1
  -rw-r--r-- 1 alp514 faculty  752758 Jan 11 12:17 log.lammps-10-2
  -rw-r--r-- 1 alp514 faculty  377758 Jan 11 12:17 log.lammps-5-3
  -rw-r--r-- 1 alp514 faculty 1127759 Jan 11 12:17 log.lammps-15-1
  -rw-r--r-- 1 alp514 faculty 1127758 Jan 11 12:19 log.lammps-15-2
  -rw-r--r-- 1 alp514 faculty  752758 Jan 11 12:21 log.lammps-10-3
  -rw-r--r-- 1 alp514 faculty 1127763 Jan 11 12:26 log.lammps-15-3

If you encounter errors, first check to see if there are additional environmental variables that you need to pass and modify the export statement appropriately.

Have an example or tips that you would like to share? Feel free to edit this page.

Links to other examples