...
Our goal is to provide a simple introduction to both the hardware and software on this machine so that researchers can start using it as quickly as possible.
Note: we are currently experiencing the following error on some software on the head node:
No Format |
---|
error while loading shared libraries: libhcoll.so.1: cannot open shared object file: No such file or directory |
If you see this error, it’s best tos start an interactive job first.
No Format |
---|
salloc -c 4 -p hawkcpu -t 2:0:0 srun --pty bash |
The compute nodes do not have the error
...
Sol is a highly heterogenous cluster, meaning that it is composed of many different types of hardware. The hardware on the cluster has three features:
- Architecture a.k.a. instruction set
- High-speed Infiniband (IB) networking available for many nodes
- Specialized graphics processing units (GPUs) available one some nodes
...
Architecture is the most important feature of our hardware, because it determines the set of software that you can use. We have whittled down the number of architectures into three categories. We list the architectures in reverse chronological order and give each of them an Lmod architecture name explained in the software section below.
- Intel Haswell (2013) uses
arch/haswell24v2
- Intel Cascade Lake (2019) uses
arch/cascade24v2
- Intel Ice Lake (2020) and higher uses
arch/ice24v2
Each architeture provides a distinct instruction set, and all compiled software on our cluster depends on these instructions. The architectures are backwards compatible, meaning that you can always use software compiled for an older architecture on newer hardware.
...
Anchor | ||||
---|---|---|---|---|
|
Sol is a highly heterogenous cluster, meaning that it is composed of many different types of hardware. The hardware on the cluster has three features:
- Architecture a.k.a. instruction set
- High-speed Infiniband (IB) networking available for many nodes
- Specialized graphics processing units (GPUs) available one some nodes
Anchor | ||||
---|---|---|---|---|
|
Architecture is the most important feature of our hardware, because it determines the set of software that you can use. We have whittled down the number of architectures into three categories. We list the architectures in reverse chronological order and give each of them an Lmod architecture name explained in the software section below.
- Intel Haswell (2013) uses
arch/haswell24v2
- Intel Cascade Lake (2019) uses
arch/cascade24v2
- Intel Ice Lake (2020) and higher uses
arch/ice24v2
Each architeture provides a distinct instruction set, and all compiled software on our cluster depends on these instructions. The architectures are backwards compatible, meaning that you can always use software compiled for an older architecture on newer hardware.
Anchor | ||||
---|---|---|---|---|
|
Besides architecture, there are two remaining pieces of specialized hardware that may be relevant to your workflows. First, most of the cluster has access to high-speed Infinband (IB) networking. This network makes it possible to run massively parallel calculations across multiple nodes.
...
Many partitions suffixed -gpu
Anchor | ||||
---|---|---|---|---|
|
We segment the hardware on the cluster by SLURM partitions listed below. SLURM is our scheduler, and it allows each user to carve off a section of the cluster for their exclusive use. Note that the cluster is currently undergoing an upgrade. We report only the upgraded partitions here, but the full guide is available.
...
In the next section, we will explain how to use the Lmod architecture when selecting your software.
Anchor | ||||
---|---|---|---|---|
|
There are two kinds of software on our system:
...
Even users with custom codes will use the system-wide compilers and software to build their own software. It is particularly important to use the system-wide compilers and MPI (message-passing interface) implementations on our system to fully leverage the HPC hardware.
Anchor | ||||
---|---|---|---|---|
|
Users who log on to the head node at sol.cc.lehigh.edu
will have access to the Lmod module
command. Our default modules include the arch/cascade24v2
, which matches the Cascade Lake architecture of the head node and the Hawk partitions, along with a default compiler (gcc
) and a default MPI implementation (openmpi
).
...
You can string together multiple modules on one command. Version numbers are optional. Lmod will select the default module, typically the highest version.
Anchor | ||||
---|---|---|---|---|
|
We reviewed our hardware at the top of this guide because it significantly restricts the types of software that you can use on each of our SLURM partitions. As a result, users should align the following choices when configuring their workflows:
...
The Lmod modules system provides a "software hierarchy" feature that allows us to deliver software for a specific architecture, compiler, and MPI. We exclude
Anchor | ||||
---|---|---|---|---|
|
Loading or unloading the arch
modules will change the available software so it matches one of two architectures: either Ice Lake or Cascade Lake. Note that the Hawk partition uses the lower Cascade Lake architecture, and this is the default for the head node as well.
...
The upshot of this system is that users are encouraged to develop explicit recipes that match their software, architecture, and SLURM partition. If you need to run high-performance codes on the newer nodes, while also using some of the large arch/cascade24v2
software library provided by our modules system, you might want to build module collections using guidance in the next section.
Anchor | ||||
---|---|---|---|---|
|
As we will explain in the exclusive architectures section above, the arch
module will limit the available software to a specific Lmod architecture name, typically either arch/cascade24v2
or arch/ice24v2
, corresponding to our late-2024 editions of Cascade Lake or Ice Lake software.
...
This allows you to abstract the software details away from your SLURM scripts, meaning you can upgrade the software, add new modules, etc, without editing many individual SLURM scripts.
Anchor |
---|
...
|
Users are welcome to extend the modules systems with their own, custom modules.
No Format |
---|
Anchor | ||||
---|---|---|---|---|
|
Following the January 2025 upgrade we are rebuilding large sets of software for our users. You can expect to see the list of available modules grow in the coming months. In the meantime, we have a transitional period in which the legacy software is still available. This feature is documented on our upgrade page.
...
Users with new software requests or or general questions should open a ticket.
Anchor | ||||
---|---|---|---|---|
|
Users are welcome to use our web portal to access the cluster. This portal is based on the Open OnDemand project and can be found behind the VPN or on the campus network at hpcportal.cc.lehigh.edu
.
...
Users can select the number of cores and time limit (up to 4 hours) using the usual SLURM flags. It is also possible to use other partitions for interactive jobs with longer limits, but we cannot predict your wait times in advance.
Anchor | ||||
---|---|---|---|---|
|
We have started to add popular python packages directly to the Lmod modules system, so that quick calculations require zero extra installation steps. For example:
...
Note that we are using a trick to write a file from the command line using cat…EOF
. This command should be copied through the second EOF
and executed directly in the terminal. You could just as easily write the text into a new file with your favorite text editor.
...
module load python
# name your ceph project
CEPH_PROJECT=hpctraining_proj
# go to the shared forlder
cd $CEPH_PROJECT/shared
# the cat command below writes a spec file
# you should copy the entire multi-line command through the second EOF
cat > env-myenv-spec.txt <<EOF
scipy
seaborn
EOF
python -m venv ./venv-projectA
source ./venv-projectA/bin/activate
pip install -r env-myenv-spec.txt
Later, you can use this environment with:
No Format |
---|
source $CEPH_PROJECT/shared/venv-projectA/bin/activate |
This procedure is the best way to add software to your Python-based software environment if it is not available when you search for module spider <some_package_name>
as easily write the text into a new file with your favorite text editor.
Code Block | ||||
---|---|---|---|---|
| ||||
module load python
# name your ceph project
CEPH_PROJECT=hpctraining_proj
# go to the shared forlder
cd $HOME/$CEPH_PROJECT/shared
# the cat command below writes a spec file
# you should copy the entire multi-line command through the second EOF
cat > venv-spec-project-a01.txt <<EOF
scipy==1.15
seaborn
EOF
python -m venv ./venv-project-a01
source ./venv-projectA/bin/activate
pip install -r venv-spec-project-a01.txt
|
Later, you can use this environment by using the absolute path to your virtual environment:
No Format |
---|
my_share_folder=/share/ceph/hawk/hpctraining_proj/shared/
source $my_share_folder/venv-project-a01/bin/activate |
This procedure is the best way to add software to your Python-based software environment if it is not available when you search for module spider <some_package_name>
.
Anchor | ||||
---|---|---|---|---|
|
If you want to use the Python virtual environment above with less text, you can make a custom modulefile.
No Format |
---|
MY_PROJECT_NAME=project-a01
MY_PROJECT_NAME=project-a01 make_venv_module.py |
After you run this command, you can access the module (without the helpful terminal prompt, however), with this command:
No Format |
---|
ml own project-a01 |
Be sure to replace project-a01
through the instructions above with a meaningful name. After you create this module, you can add additional software and save a module collection, for example:
No Format |
---|
module load own project-a01
module load intel-oneapi-mkl
module save |
This saves your custom module to the default collection so it is always available without adding any module commands to your scripts. If you have many projects, you could use a collection name:
No Format |
---|
module load own project-a01
module load intel-oneapi-mkl
module save project-a01 |
In this case, you could access this specific project with a single command:
No Format |
---|
module restore project-a01 |
The goal for this method is to leverage the module system, apply it to custom virutal environments, and make your SLURM scripts as simple as possible.
Anchor | ||||
---|---|---|---|---|
|
We have installed a miniconda3
module so that users can build their own Anaconda environments. Please not that a standard Python virtual enviroment is the preferred way to build virtual environments unless you need access to the broader set of packages provided by conda
.
...
Code Block | ||||
---|---|---|---|---|
| ||||
# instructions for maintaining a shared conda env module load miniconda3 # name your ceph project CEPH_PROJECT=hpctraining_proj # go to the shared forlder cd $CEPH_PROJECT/shared # the cat command below writes a spec file # you should copy the entire multi-line command through the second EOF cat > env-conda-myenv-spec.yaml <<EOF name: stats2 channels: - javascript dependencies: - python=3.9 - bokeh=2.4.2 - conda-forge::numpy=1.21.* - nodejs=16.13.* - flask - pip - pip: - Flask-Testing EOF conda env update -f env-conda-myenv-spec.yaml -p ./env-conda-myenv # the following is the path to this environment echo $PWD/env-conda-myenv # activate any arbitrary conda environment by using this path module load miniconda3 conda activate ~/$CEPH_PROJECT/shared/env-conda-myenv |
...
conda |
...
The following items are under construction as of January 9, 2025. We expect them to be added soon:
- Many pieces of research software were requested for newer architectures, namely the Ice Lake and Sapphire rapids architectures provided by the
arch/ice24v2
module. - Several other pieces of software requested by individual research groups. These are still in-progress. Users will be notified via a response to their tickets when the software has been added.
...
-myenv
|
You can add any conda
or pip
packages to the Anaconda environment file we used above. The conda env update
procedure above ensures that you can easily update or reproduce this environment on other systems.
Anchor | ||||
---|---|---|---|---|
|
Our documentation is in a transitional state. Besides the upgrade guide, and this quickstart, we also maintain tutorial-style notes from the HPC sessions offered as part of the LTS seminar series, which can be found at go.lehigh.edu/rcnotes
.