Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

But first, let’s review best practices for building a lab notebook.

Anchor
recipes-and-scripts
recipes-and-scripts
Recipes and Scripts

Resarchers who use an HPC cluster should maintain a lab notebook that provides detailed instructions for repeating their work and reproducing their findings. Using a terminal makes this easy, since almost every step can be captured at the command line and added to the script.

...

Since there is no Linux command "$", you should be able to infer the correct commands from context.

Anchor
the-cat-trick
the-cat-trick
The cat trick

In the spirit of building self-contained, easy-to-read blocks of text to describe our works, we sometimes want to tell a user to "save some text in a file". To avoid unnecessary exposition, it can be convenient to include this inline with the commands instead of separating the text of the file from the rest of the procedure. We can accomplish this by writing a text file directly from the terminal. If we continue from our example above, we can write a file into our shared Ceph space:

...

Once your reader knows about the cat trick, or if they are a Linux expert already, then you can easily share elaborate installation procedures with them. Our group uses it extensievly when sharing instructions with our users.

Anchor
python
python
Python

While there are many different python packaging systems (for example pipenv and poetry), we focus exclusively on venv and conda, with a strong preference for using venv whenever possible, because it is the simplest method. Nevertheless, many different packages are exclusively distributed on conda channels, so users are encouraged to use their discretion when deciding between these two options.

Anchor
use-preinstalled-packages
use-preinstalled-packages
Use preinstalled packages

In the example below, we will install numpy as an example. If you only need numpy, however, you can avoid installing a virtual environment and just use our Lmod modules system:

...

The python packages use the py- prefix (while R packages use the r- prefix).

Anchor
python-virtual-environments
python-virtual-environments
Python virtual environments

Building a python virtual environment is covered in the standard library documentation for the venv module. First, make sure that you have access to the Lmod python module:

...

No Format
cat > reqs.txt <<EOF
numpy==2.2.2
EOF
python -m pip install -r reqs.txt

Anchor
repeatable-venv-method
repeatable-venv-method
Repeatable venv method

We can combine all of the features above into a single, concise set of instructions that uses the cat trick for deploying our software environment:

...

These instructions are portable to other systems, and should be included in any researcher’s lab notebook.

Anchor
conda-environments
conda-environments
Conda environments

Some authors distribute their codes exclusively on conda channels, for example bioconda. While we typically prefer the bog-standard python and pip installation process, oftentimes conda provides additional distributions. Before providing a concise build method below, we will review some important context.

Anchor
requirements-for-using-conda
requirements-for-using-conda
Requirements for using conda

Our cluster uses the miniconda3 module to provide access to Anaconda environments. Our current understanding of the Anaconda terms of service is that higher-education non-profit organizations can use Anaconda for teaching purposes, while organizations with more than 200 members are subject to licensing fees.

...

Readers should review the python virtual envioronments section before using the following method, because conda acts as a superset of pip in which we make a few substitutions. Understanding this pattern benefits anyone using an environment.

Anchor
repeatable-conda-environments
repeatable-conda-environments
Repeatable conda environments

We recommend building all conda environments from a YAML-based requirements file written into our instructions below using the cat trick. You can use the following method to customize your own environment as long as you select a location and list all of your dependencies.

...

This method takes inputs from cenv-project-v01-reqs.yaml and builds an environment from them. You can include both conda and pip packages. The export file, cenv-project-v01-export.yaml records the exact versions that conda found, so you can reproduce this environment on another system if you want.

Anchor
r-packages
r-packages
R packages

Before installing R packages, it can be useful to see if it’s already available. Imagine you are looking for ggplot2. You can use the Lmod system to find it:

...