GNU Parallel
GNU Parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. The typical input is a list of files, a list of hosts, a list of users, a list of URLs, or a list of tables. A job can also be a command that reads from a pipe. GNU parallel can then split the input and pipe it into commands in parallel.
https://www.gnu.org/software/parallel/
Install on your Mac OSX desktop and laptop using either brew or Macports (I know Macports works since I use it on my MacBook)
sudo port install parallel brew install parallel
Usage
module load parallel/20170322
parallel echo ::: 1 2 3 ::: 4 5 6
Purpose
GNU Parallel is a tool for running a series of jobs, mostly serial jobs, in parallel. This tool is best suited for running a bunch of serial jobs that may not run for the same simulation time. For example, the most easiest way to run a series of serial jobs in parallel on n cpus within one submit script is as follows
./job_1 & ./job_2 & ... ./job_n & wait ./job_(n+1) & ./job_(n+2) & ... ./job_2n & wait ./job_(2n+1) & ./job_(2n+2) & ... ./job_3n & wait
This works most efficiently when all jobs have the same or almost the same run time. If run times for jobs are unequal, then n jobs are run simultaneously and the cpus remain idle until all n jobs are completed before looping through the next n jobs. This will lead to idle time and inefficient consumption of cpu time.
GNU Parallel solves this issue by first launching n jobs. When one job completes, then the next job in sequence is started. This permits efficient use of cpu time by reducing the wait time and letting a number of small jobs to run while some cpus work on longer jobs.
parallel job_{1} ::: $(seq 1 3n)
Single Node example using a LAMMPS benchmark run
The following example is run using a interactive session. However, you should be able to run this via a SLURM script
[2018-03-12 09:19.54] ~ [alp514.sol](1002): interact -p test -n 36 [2018-03-12 09:19.57] ~ [alp514.sol-e601](1001): module load lammps [2018-03-12 09:20.01] ~ [alp514.sol-e601](1002): module load parallel [2018-03-12 09:20.07] ~ [alp514.sol-e601](1003): cd /share/Apps/examples/parallel/ [2018-03-12 09:20.13] /share/Apps/examples/parallel [alp514.sol-e601](1004): time parallel 'srun -n 1 $(which lammps) -in in.lj -var n {1} -var x {2} -log log.lammps-{1}-{2} -sc none' ::: $(seq 5 5 100) ::: $(seq 1 6) real 4m8.378s user 0m1.391s sys 0m1.787s [2018-03-12 09:24.51] /share/Apps/examples/parallel [alp514.sol-e601](1005): time parallel 'srun -n 1 $(which lammps) -in in.lj -var n {1} -var x {2} -log log.lammps- {1}-{2} -sc none' ::: $(seq 100 -5 5) ::: $(seq 6 -1 1) real 3m47.091s user 0m1.391s sys 0m1.830s
The difference in runtime above is due to the nature of the jobs. In the first example, the longer jobs are at the end while in the second example the shorter jobs at the end. In the second case, as the longer jobs complete, the shorter ones are run and there is less waiting at the end. The actual LAMMPS input file is
[2018-03-12 09:29.48] /share/Apps/examples/parallel [alp514.sol-e601](1006): cat in.lj # 3d Lennard-Jones melt #variable x index 3 #variable y index 3 #variable z index 3 variable t equal 100*$n variable xx equal 20*$x variable yy equal 1*$x variable zz equal 1*$x units lj atom_style atomic lattice fcc 0.8442 region box block 0 ${xx} 0 ${yy} 0 ${zz} create_box 1 box create_atoms 1 box mass 1 1.0 velocity all create 1.44 87287 loop geom pair_style lj/cut 2.5 pair_coeff 1 1 1.0 1.0 2.5 neighbor 0.3 bin neigh_modify delay 0 every 20 check no fix 1 all nve thermo 1 run $t
The actual simulations launched by parallel are (reduced the number of jobs for clarity).
[2018-03-12 09:30.04] /share/Apps/examples/parallel [alp514.sol-e601](1007): parallel echo 'srun -n 1 $(which lammps) -in in.lj -var n {1} -var x {2} -log log.lammps-{1}-{2} -sc none' ::: 5 10 15 ::: 1 2 3 srun -n 1 /share/Apps/lammps/14may16/bin/lammps -in in.lj -var n 5 -var x 1 -log log.lammps-5-1 -sc none srun -n 1 /share/Apps/lammps/14may16/bin/lammps -in in.lj -var n 5 -var x 2 -log log.lammps-5-2 -sc none srun -n 1 /share/Apps/lammps/14may16/bin/lammps -in in.lj -var n 5 -var x 3 -log log.lammps-5-3 -sc none srun -n 1 /share/Apps/lammps/14may16/bin/lammps -in in.lj -var n 10 -var x 1 -log log.lammps-10-1 -sc none srun -n 1 /share/Apps/lammps/14may16/bin/lammps -in in.lj -var n 10 -var x 2 -log log.lammps-10-2 -sc none srun -n 1 /share/Apps/lammps/14may16/bin/lammps -in in.lj -var n 10 -var x 3 -log log.lammps-10-3 -sc none srun -n 1 /share/Apps/lammps/14may16/bin/lammps -in in.lj -var n 15 -var x 1 -log log.lammps-15-1 -sc none srun -n 1 /share/Apps/lammps/14may16/bin/lammps -in in.lj -var n 15 -var x 2 -log log.lammps-15-2 -sc none srun -n 1 /share/Apps/lammps/14may16/bin/lammps -in in.lj -var n 15 -var x 3 -log log.lammps-15-3 -sc none
The actual runs can also be launched from a file. Below the LAMMPS run commands are written to a file, //run.sh// (you can use any extension or just skip the extension)
[alp514.sol-e601](1008): parallel echo 'srun -n 1 $(which lammps) -in in.lj -var n {1} -var x {2} -log log.lammps-{1}-{2} -sc none' ::: 5 10 15 ::: 1 2 3 > run.sh
To run the jobs in parallel, you need to supply the filename as a command line argument, -a filename to parallel
[2018-03-12 09:30.54] /share/Apps/examples/parallel [alp514.sol-e601](1009): parallel -a run.sh Alternatively, you can pipe the commands to parallel (the --eta argument will show a progress bar) [2018-03-12 09:31.57] /share/Apps/examples/parallel [alp514.sol-e601](1011): cat run.sh | parallel --eta Computers / CPU cores / Max jobs to run 1:local / 36 / 9 Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete ETA: 0s Left: 0 AVG: 0.33s local:0/9/100%/0.4s
Multi Node example using a LAMMPS benchmark run
If you pass GNU Parallel a file with a list of nodes it will run jobs on each node.
[2019-01-11 12:09.53] /share/Apps/examples/parallel/test [alp514.sol](645): interact -p engi --ntasks-per-node=36 -N 2 [2019-01-11 12:09.54] /share/Apps/examples/parallel/test [alp514.sol-e608](944): module load parallel [2019-01-11 12:10.01] /share/Apps/examples/parallel/test [alp514.sol-e608](945): scontrol show hostname > nodelist.txt [2019-01-11 12:10.07] /share/Apps/examples/parallel/test [alp514.sol-e608](946): parallel --jobs 1 --sshloginfile nodelist.txt --workdir $PWD -a command.txt sol-e608.cc.lehigh.edu sol-e609.cc.lehigh.edu [2019-01-11 12:10.11] /share/Apps/examples/parallel/test [alp514.sol-e608](947): cat command.txt echo $HOSTNAME echo $HOSTNAME
Minimum Requirements
- --jobs: how many jobs to run per node
- --sshloginfile: name of file containing a list of nodes on which to run jobs
- --workdir: directory where to run your job on remote nodes
- --env: environment variable to pass on to remote nodes (see below)
SLURM provides a variable $SLURM_JOB_NODELIST that contains a list of nodes in a contracted format for e.g. sol-e[607-608] that is not useful. A useful command to extract hostnames is scontrol show hostname that you need to pass onto parallel.
Loading Modules
Multi-node jobs are a little tricky because the remote nodes do not inherit the environment from the head node, so any modules loaded by the slurm script won’t be present on the remote nodes. Also, the module command is really just a shell alias, and aliases don’t work in the non-interactive bash sessions that are created on the remote nodes. One workaround is to include this environment variable definition in your SLURM script after you have loaded your modules, but before you run GNU Parallel:
[2019-01-11 12:14.37] /share/Apps/examples/parallel/test [alp514.sol-e608](957): module load parallel [2019-01-11 12:14.42] /share/Apps/examples/parallel/test [alp514.sol-e608](958): module load lammps/12dec18 [2019-01-11 12:14.46] /share/Apps/examples/parallel/test [alp514.sol-e608](959): export PARALLEL="--workdir . --env PATH --env LD_LIBRARY_PATH --env LOADEDMODULES --env _LMFILES_ --env MODULE_VERSION --env MODULEPATH --env MODULEVERSION_STACK --env MODULESHOME" [2019-01-11 12:14.51] /share/Apps/examples/parallel/test [alp514.sol-e608](960): parallel --jobs 1 --sshloginfile nodelist.txt 'mpiexec -n 36 $(which lammps) -in in.lj -var n {1} -var x {2} -log log.lammps-{1}-{2} -sc none' ::: $(seq 5 5 15) ::: $(seq 1 3) [2019-01-11 12:26.14] /share/Apps/examples/parallel/test [alp514.sol-e608](961): egrep -i 'loop time' log* log.lammps-10-1:Loop time of 9.16174 on 36 procs for 10000 steps with 32000 atoms log.lammps-10-2:Loop time of 69.3107 on 36 procs for 10000 steps with 256000 atoms log.lammps-10-3:Loop time of 267.544 on 36 procs for 10000 steps with 864000 atoms log.lammps-15-1:Loop time of 13.7018 on 36 procs for 15000 steps with 32000 atoms log.lammps-15-2:Loop time of 106.147 on 36 procs for 15000 steps with 256000 atoms log.lammps-15-3:Loop time of 387.021 on 36 procs for 15000 steps with 864000 atoms log.lammps-5-1:Loop time of 4.54284 on 36 procs for 5000 steps with 32000 atoms log.lammps-5-2:Loop time of 34.637 on 36 procs for 5000 steps with 256000 atoms log.lammps-5-3:Loop time of 128.408 on 36 procs for 5000 steps with 864000 atoms [2019-01-11 12:26.23] /share/Apps/examples/parallel/test [alp514.sol-e608](962): ls -ltr log* -rw-r--r-- 1 alp514 faculty 377756 Jan 11 12:15 log.lammps-5-1 -rw-r--r-- 1 alp514 faculty 377753 Jan 11 12:15 log.lammps-5-2 -rw-r--r-- 1 alp514 faculty 752759 Jan 11 12:16 log.lammps-10-1 -rw-r--r-- 1 alp514 faculty 752758 Jan 11 12:17 log.lammps-10-2 -rw-r--r-- 1 alp514 faculty 377758 Jan 11 12:17 log.lammps-5-3 -rw-r--r-- 1 alp514 faculty 1127759 Jan 11 12:17 log.lammps-15-1 -rw-r--r-- 1 alp514 faculty 1127758 Jan 11 12:19 log.lammps-15-2 -rw-r--r-- 1 alp514 faculty 752758 Jan 11 12:21 log.lammps-10-3 -rw-r--r-- 1 alp514 faculty 1127763 Jan 11 12:26 log.lammps-15-3
If you encounter errors, first check to see if there are additional environmental variables that you need to pass and modify the export statement appropriately.
Have an example or tips that you would like to share? Feel free to edit this page.
Links to other examples
- https://www.gnu.org/software/parallel/parallel_tutorial.html
- https://www.biostars.org/p/63816/
- http://www.shakthimaan.com/posts/2014/11/27/gnu-parallel/news.html
- https://www.msi.umn.edu/support/faq/how-can-i-use-gnu-parallel-run-lot-commands-parallel
- https://github.com/LangilleLab/microbiome_helper/wiki/Quick-Introduction-to-GNU-Parallel
- https://davetang.org/muse/2013/11/18/using-gnu-parallel/
- https://sites.google.com/a/stanford.edu/rcpedia/parallel-processing/gnu-parallel-examples
- http://phili.pe/posts/free-concurrency-with-gnu-parallel/