LLM QA Builder

This example deploys a freely-available large langauge model (LLM) and feeds it a chunk of text for the purposes of building question-answer (QA) pairs from this text.

Background

Building an LLM that can analyze or read custom data typically falls into two categories:

  1. Retrieval augmented generation (RAG) in which a set of documents are divided into chunks, stored in a vector store, searched for relevance to a question, and then delivered to a stock LLM with no fine-tuning so that this LLM can interpret the data, answer questions, and cite sources.

  2. Fine-tuning in which a stock LLM is re-trained on an additional set of data to produce entirely new parameters.

We recommend using both methods in combination when building specialist LLMs intended to draw from customized datasets. In particular, fine-tuning can come in several different forms. The most common requires that you curate a set of question and answer pairs. We can use a stock LLM to do this.

Requirements

  • A HuggingFace account and associated API token.

  • A Lehigh HPC account with training to enable you to access our GPU partitions.

  • A chunk of text.

Method

1. Sign up for HuggingFace

This example uses HuggingFace (hereafter, HF) infrastructure to work with LLMs. Sign up for an account at huggingface.co and then create a new access token.

Next, you must apply for access to Llama3 8B by visiting the repository on HF.

2. Get an interactive session

To make sure we can use the GPUs for this project, we need to get an interactive session on the lake-gpu partition. Note that this is not a high-availability partition, meaning that you may have to wait a long time to get access. If access is an impediment, the Research Computing team can install this for you upon request.

# make a directory in your ceph space SPOT=/share/ceph/hawk/lts_proj/rpb222/tmp-llm-qa mkdir -p $SPOT cd $SPOT # select one of the following two commands for an interactive session salloc -p rapids-express -c 4 -t 60 srun --pty bash # alternately: salloc -p lake-gpu -c 8 --gres=gpu:1 -t 180 srun --pty bash # load the new software tee sol_lake

3. Build an environment

Continuing from the previous step, we build a new Python virtual environment. First, create a file called req-lake.txt with the lines shown between the EOF flags. You can also paste this block into the command line to write the file automatically.

cat > reqs-lake.txt <<EOF transformers>=4.40.0 trl accelerate bitsandbytes peft EOF

Next, create a virtual environment in the usual way.

module load python python -m venv ./venv-lake source ./venv-lake/bin/activate time pip install -r reqs-lake.txt

4. Download the models

Select a central location on your Ceph space to store HF models. Collect your HF access token and set this as an environment variable along with your HF username.

Next, we download the "Meta Llama 3 3B Instruct" model from the HF hub. This consumes about 15GB space.

5. Select an example

For this example we take an excerpt from an academic paper. We compose the following

6. Build a script

We can connect our data to the model with a simple script. Copy the following text to script-llm-qa-builder.py:

Next Steps

The following example provides a method for building a database of QA pairs locally without relying on an API provided by a paid, cloud service. To continue this project, we will need to connect this component, which can create entries in a training set, to a more systematic database so that we can format the training dataset for fine-tuning.