LLM QA Builder
This example deploys a freely-available large langauge model (LLM) and feeds it a chunk of text for the purposes of building question-answer (QA) pairs from this text.
Background
Building an LLM that can analyze or read custom data typically falls into two categories:
Retrieval augmented generation (RAG) in which a set of documents are divided into chunks, stored in a vector store, searched for relevance to a question, and then delivered to a stock LLM with no fine-tuning so that this LLM can interpret the data, answer questions, and cite sources.
Fine-tuning in which a stock LLM is re-trained on an additional set of data to produce entirely new parameters.
We recommend using both methods in combination when building specialist LLMs intended to draw from customized datasets. In particular, fine-tuning can come in several different forms. The most common requires that you curate a set of question and answer pairs. We can use a stock LLM to do this.
Requirements
A HuggingFace account and associated API token.
A Lehigh HPC account with training to enable you to access our GPU partitions.
A chunk of text.
Method
1. Sign up for HuggingFace
This example uses HuggingFace (hereafter, HF) infrastructure to work with LLMs. Sign up for an account at huggingface.co and then create a new access token.
Next, you must apply for access to Llama3 8B by visiting the repository on HF.
2. Get an interactive session
To make sure we can use the GPUs for this project, we need to get an interactive session on the lake-gpu
partition. Note that this is not a high-availability partition, meaning that you may have to wait a long time to get access. If access is an impediment, the Research Computing team can install this for you upon request.
# make a directory in your ceph space
SPOT=/share/ceph/hawk/lts_proj/rpb222/tmp-llm-qa
mkdir -p $SPOT
cd $SPOT
# select one of the following two commands for an interactive session
salloc -p rapids-express -c 4 -t 60 srun --pty bash
# alternately: salloc -p lake-gpu -c 8 --gres=gpu:1 -t 180 srun --pty bash
# load the new software tee
sol_lake
3. Build an environment
Continuing from the previous step, we build a new Python virtual environment. First, create a file called req-lake.txt
with the lines shown between the EOF
flags. You can also paste this block into the command line to write the file automatically.
cat > reqs-lake.txt <<EOF
transformers>=4.40.0
trl
accelerate
bitsandbytes
peft
EOF
Next, create a virtual environment in the usual way.
module load python
python -m venv ./venv-lake
source ./venv-lake/bin/activate
time pip install -r reqs-lake.txt
4. Download the models
Select a central location on your Ceph space to store HF models. Collect your HF access token and set this as an environment variable along with your HF username.
Next, we download the "Meta Llama 3 3B Instruct" model from the HF hub. This consumes about 15GB space.
5. Select an example
For this example we take an excerpt from an academic paper. We compose the following
6. Build a script
We can connect our data to the model with a simple script. Copy the following text to script-llm-qa-builder.py
:
Next Steps
The following example provides a method for building a database of QA pairs locally without relying on an API provided by a paid, cloud service. To continue this project, we will need to connect this component, which can create entries in a training set, to a more systematic database so that we can format the training dataset for fine-tuning.