...
But first, let’s review best practices for building a lab notebook.
Anchor | ||||
---|---|---|---|---|
|
Resarchers who use an HPC cluster should maintain a lab notebook that provides detailed instructions for repeating their work and reproducing their findings. Using a terminal makes this easy, since almost every step can be captured at the command line and added to the script.
...
Since there is no Linux command "$
", you should be able to infer the correct commands from context.
Anchor | ||||
---|---|---|---|---|
|
In the spirit of building self-contained, easy-to-read blocks of text to describe our works, we sometimes want to tell a user to "save some text in a file". To avoid unnecessary exposition, it can be convenient to include this inline with the commands instead of separating the text of the file from the rest of the procedure. We can accomplish this by writing a text file directly from the terminal. If we continue from our example above, we can write a file into our shared Ceph space:
...
Once your reader knows about the cat trick, or if they are a Linux expert already, then you can easily share elaborate installation procedures with them. Our group uses it extensievly when sharing instructions with our users.
Anchor | ||||
---|---|---|---|---|
|
While there are many different python packaging systems (for example pipenv and poetry), we focus exclusively on venv
and conda
, with a strong preference for using venv
whenever possible, because it is the simplest method. Nevertheless, many different packages are exclusively distributed on conda
channels, so users are encouraged to use their discretion when deciding between these two options.
Anchor | ||||
---|---|---|---|---|
|
In the example below, we will install numpy
as an example. If you only need numpy
, however, you can avoid installing a virtual environment and just use our Lmod modules system:
...
The python packages use the py-
prefix (while R packages use the r-
prefix).
Anchor | ||||
---|---|---|---|---|
|
Building a python virtual environment is covered in the standard library documentation for the venv
module. First, make sure that you have access to the Lmod python
module:
...
No Format |
---|
cat > reqs.txt <<EOF numpy==2.2.2 EOF python -m pip install -r reqs.txt |
Anchor | ||||
---|---|---|---|---|
|
We can combine all of the features above into a single, concise set of instructions that uses the cat trick for deploying our software environment:
...
These instructions are portable to other systems, and should be included in any researcher’s lab notebook.
Anchor | ||||
---|---|---|---|---|
|
Some authors distribute their codes exclusively on conda
channels, for example bioconda. While we typically prefer the bog-standard python and pip
installation process, oftentimes conda
provides additional distributions. Before providing a concise build method below, we will review some important context.
Anchor | ||||
---|---|---|---|---|
|
Our cluster uses the miniconda3
module to provide access to Anaconda environments. Our current understanding of the Anaconda terms of service is that higher-education non-profit organizations can use Anaconda for teaching purposes, while organizations with more than 200 members are subject to licensing fees.
...
Readers should review the python virtual envioronments section before using the following method, because conda
acts as a superset of pip
in which we make a few substitutions. Understanding this pattern benefits anyone using an environment.
Anchor | ||||
---|---|---|---|---|
|
We recommend building all conda
environments from a YAML-based requirements file written into our instructions below using the cat trick. You can use the following method to customize your own environment as long as you select a location and list all of your dependencies.
...
This method takes inputs from cenv-project-v01-reqs.yaml
and builds an environment from them. You can include both conda
and pip
packages. The export file, cenv-project-v01-export.yaml
records the exact versions that conda
found, so you can reproduce this environment on another system if you want.
Anchor | ||||
---|---|---|---|---|
|
Before installing R packages, it can be useful to see if it’s already available. Imagine you are looking for ggplot2
. You can use the Lmod system to find it:
...