GNU Parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. The typical input is a list of files, a list of hosts, a list of users, a list of URLs, or a list of tables. A job can also be a command that reads from a pipe. GNU parallel can then split the input and pipe it into commands in parallel.
...
Code Block |
---|
|
sudo port install parallel
brew install parallel |
Usage
Code Block |
---|
language | bash |
---|
title | Add module |
---|
|
module load parallel/20170322 |
...
Code Block |
---|
|
[2018-03-12 09:19.54] ~
[alp514.sol](1002): interact -p test -n 36
[2018-03-12 09:19.57] ~
[alp514.sol-e601](1001): module load lammps
[2018-03-12 09:20.01] ~
[alp514.sol-e601](1002): module load parallel
[2018-03-12 09:20.07] ~
[alp514.sol-e601](1003): cd /share/Apps/examples/parallel/
[2018-03-12 09:20.13] /share/Apps/examples/parallel
[alp514.sol-e601](1004): time parallel 'srun -n 1 $(which lammps) -in in.lj -var n {1} -var x {2} -log log.lammps-{1}-{2} -sc none' ::: $(seq 5 5 100) ::: $(seq 1 6)
real 4m8.378s
user 0m1.391s
sys 0m1.787s
[2018-03-12 09:24.51] /share/Apps/examples/parallel
[alp514.sol-e601](1005): time parallel 'srun -n 1 $(which lammps) -in in.lj -var n {1} -var x {2} -log log.lammps- {1}-{2} -sc none' ::: $(seq 100 -5 5) ::: $(seq 6 -1 1)
real 3m47.091s
user 0m1.391s
sys 0m1.830s
|
The difference in runtime above is due to the nature of the jobs. In the first example, the longer jobs are at the end while in the second example the shorter jobs at the end. In the second case, as the longer jobs complete, the shorter ones are run and there is less waiting at the end. The actual LAMMPS input file is
Code Block |
---|
|
[2018-03-12 09:29.48] /share/Apps/examples/parallel
[alp514.sol-e601](1006): cat in.lj
# 3d Lennard-Jones melt
#variable x index 3
#variable y index 3
#variable z index 3
variable t equal 100*$n
variable xx equal 20*$x
variable yy equal 1*$x
variable zz equal 1*$x
units lj
atom_style atomic
lattice fcc 0.8442
region box block 0 ${xx} 0 ${yy} 0 ${zz}
create_box 1 box
create_atoms 1 box
mass 1 1.0
velocity all create 1.44 87287 loop geom
pair_style lj/cut 2.5
pair_coeff 1 1 1.0 1.0 2.5
neighbor 0.3 bin
neigh_modify delay 0 every 20 check no
fix 1 all nve
thermo 1
run $t |
The actual simulations launched by parallel are (reduced the number of jobs for clarity).
Code Block |
---|
|
[2018-03-12 09:30.04] /share/Apps/examples/parallel
[alp514.sol-e601](1007): parallel echo 'srun -n 1 $(which lammps) -in in.lj -var n {1} -var x {2} -log log.lammps-{1}-{2} -sc none' ::: 5 10 15 ::: 1 2 3
srun -n 1 /share/Apps/lammps/14may16/bin/lammps -in in.lj -var n 5 -var x 1 -log log.lammps-5-1 -sc none
srun -n 1 /share/Apps/lammps/14may16/bin/lammps -in in.lj -var n 5 -var x 2 -log log.lammps-5-2 -sc none
srun -n 1 /share/Apps/lammps/14may16/bin/lammps -in in.lj -var n 5 -var x 3 -log log.lammps-5-3 -sc none
srun -n 1 /share/Apps/lammps/14may16/bin/lammps -in in.lj -var n 10 -var x 1 -log log.lammps-10-1 -sc none
srun -n 1 /share/Apps/lammps/14may16/bin/lammps -in in.lj -var n 10 -var x 2 -log log.lammps-10-2 -sc none
srun -n 1 /share/Apps/lammps/14may16/bin/lammps -in in.lj -var n 10 -var x 3 -log log.lammps-10-3 -sc none
srun -n 1 /share/Apps/lammps/14may16/bin/lammps -in in.lj -var n 15 -var x 1 -log log.lammps-15-1 -sc none
srun -n 1 /share/Apps/lammps/14may16/bin/lammps -in in.lj -var n 15 -var x 2 -log log.lammps-15-2 -sc none
srun -n 1 /share/Apps/lammps/14may16/bin/lammps -in in.lj -var n 15 -var x 3 -log log.lammps-15-3 -sc none |
The actual runs can also be launched from a file. Below the LAMMPS run commands are written to a file, //run.sh// (you can use any extension or just skip the extension)
Code Block |
---|
|
[alp514.sol-e601](1008): parallel echo 'srun -n 1 $(which lammps) -in in.lj -var n {1} -var x {2} -log log.lammps-{1}-{2} -sc none' ::: 5 10 15 ::: 1 2 3 > run.sh |
To run the jobs in parallel, you need to supply the filename as a command line argument, -a filename to parallel
Code Block |
---|
|
[2018-03-12 09:30.54] /share/Apps/examples/parallel
[alp514.sol-e601](1009): parallel -a run.sh
Alternatively, you can pipe the commands to parallel (the --eta argument will show a progress bar)
[2018-03-12 09:31.57] /share/Apps/examples/parallel
[alp514.sol-e601](1011): cat run.sh | parallel --eta
Computers / CPU cores / Max jobs to run
1:local / 36 / 9
Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
ETA: 0s Left: 0 AVG: 0.33s local:0/9/100%/0.4s |
Multi Node example using a LAMMPS benchmark run
If you pass GNU Parallel a file with a list of nodes it will run jobs on each node.
Code Block |
---|
|
[2019-01-11 12:09.53] /share/Apps/examples/parallel/test
[alp514.sol](645): interact -p engi --ntasks-per-node=36 -N 2
[2019-01-11 12:09.54] /share/Apps/examples/parallel/test
[alp514.sol-e608](944): module load parallel
[2019-01-11 12:10.01] /share/Apps/examples/parallel/test
[alp514.sol-e608](945): scontrol show hostname > nodelist.txt
[2019-01-11 12:10.07] /share/Apps/examples/parallel/test
[alp514.sol-e608](946): parallel --jobs 1 --sshloginfile nodelist.txt --workdir $PWD -a command.txt
sol-e608.cc.lehigh.edu
sol-e609.cc.lehigh.edu
[2019-01-11 12:10.11] /share/Apps/examples/parallel/test
[alp514.sol-e608](947): cat command.txt
echo $HOSTNAME
echo $HOSTNAME
|
Minimum Requirements
- --jobs: how many jobs to run per node
- --sshloginfile: name of file containing a list of nodes on which to run jobs
- --workdir: directory where to run your job on remote nodes
- --env: environment variable to pass on to remote nodes (see below)
SLURM provides a variable $SLURM_JOB_NODELIST that contains a list of nodes in a contracted format for e.g. sol-e[607-608] that is not useful. A useful command to extract hostnames is scontrol show hostname that you need to pass onto parallel.
Loading Modules
Multi-node jobs are a little tricky because the remote nodes do not inherit the environment from the head node, so any modules loaded by the slurm script won’t be present on the remote nodes. Also, the module command is really just a shell alias, and aliases don’t work in the non-interactive bash sessions that are created on the remote nodes. One workaround is to include this environment variable definition in your SLURM script after you have loaded your modules, but before you run GNU Parallel:
Code Block |
---|
|
[2019-01-11 12:14.37] /share/Apps/examples/parallel/test
[alp514.sol-e608](957): module load parallel
[2019-01-11 12:14.42] /share/Apps/examples/parallel/test
[alp514.sol-e608](958): module load lammps/12dec18
[2019-01-11 12:14.46] /share/Apps/examples/parallel/test
[alp514.sol-e608](959): export PARALLEL="--workdir . --env PATH --env LD_LIBRARY_PATH --env LOADEDMODULES --env _LMFILES_ --env MODULE_VERSION --env MODULEPATH --env MODULEVERSION_STACK --env MODULESHOME"
[2019-01-11 12:14.51] /share/Apps/examples/parallel/test
[alp514.sol-e608](960): parallel --jobs 1 --sshloginfile nodelist.txt 'mpiexec -n 36 $(which lammps) -in in.lj -var n {1} -var x {2} -log log.lammps-{1}-{2} -sc none' ::: $(seq 5 5 15) ::: $(seq 1 3)
[2019-01-11 12:26.14] /share/Apps/examples/parallel/test
[alp514.sol-e608](961): egrep -i 'loop time' log*
log.lammps-10-1:Loop time of 9.16174 on 36 procs for 10000 steps with 32000 atoms
log.lammps-10-2:Loop time of 69.3107 on 36 procs for 10000 steps with 256000 atoms
log.lammps-10-3:Loop time of 267.544 on 36 procs for 10000 steps with 864000 atoms
log.lammps-15-1:Loop time of 13.7018 on 36 procs for 15000 steps with 32000 atoms
log.lammps-15-2:Loop time of 106.147 on 36 procs for 15000 steps with 256000 atoms
log.lammps-15-3:Loop time of 387.021 on 36 procs for 15000 steps with 864000 atoms
log.lammps-5-1:Loop time of 4.54284 on 36 procs for 5000 steps with 32000 atoms
log.lammps-5-2:Loop time of 34.637 on 36 procs for 5000 steps with 256000 atoms
log.lammps-5-3:Loop time of 128.408 on 36 procs for 5000 steps with 864000 atoms
[2019-01-11 12:26.23] /share/Apps/examples/parallel/test
[alp514.sol-e608](962): ls -ltr log*
-rw-r--r-- 1 alp514 faculty 377756 Jan 11 12:15 log.lammps-5-1
-rw-r--r-- 1 alp514 faculty 377753 Jan 11 12:15 log.lammps-5-2
-rw-r--r-- 1 alp514 faculty 752759 Jan 11 12:16 log.lammps-10-1
-rw-r--r-- 1 alp514 faculty 752758 Jan 11 12:17 log.lammps-10-2
-rw-r--r-- 1 alp514 faculty 377758 Jan 11 12:17 log.lammps-5-3
-rw-r--r-- 1 alp514 faculty 1127759 Jan 11 12:17 log.lammps-15-1
-rw-r--r-- 1 alp514 faculty 1127758 Jan 11 12:19 log.lammps-15-2
-rw-r--r-- 1 alp514 faculty 752758 Jan 11 12:21 log.lammps-10-3
-rw-r--r-- 1 alp514 faculty 1127763 Jan 11 12:26 log.lammps-15-3 |
If you encounter errors, first check to see if there are additional environmental variables that you need to pass and modify the export statement appropriately.
Have an example or tips that you would like to share? Feel free to edit this page.
Links to other examples