Finetuning LLMs with Wisp #
Finetuning LLMs usually requires large GPUs, which Wisp is ideal for finding and configuring for you.
In this example, we will finetune a LLM on a Wisp instance, and download the model on completion.
The LLM #
We’ll use
mistral-finetune to finetune a Mistral-7B
model. If you need to finetune other models, you can also run your own code with Wisp.
The steps in this guide is taken directly from the Readme of mistral-finetune.
Create your Project #
First, we need to follow the instructions from mistral-finetune
to set up the project
locally before running remotely.
First, clone the repository:
cd $HOME && git clone <https://github.com/mistralai/mistral-finetune.git>
Edit example/7B.yaml
so that data
, model
and run_dir
looks like this:
data:
instruct_data: "~/data/ultrachat_chunk_train.jsonl"
eval_instruct_data: "~/ultrachat_chunk_eval.jsonl"
model_id_or_path: "~/"mistral_models/7B""
run_dir: "~/mistral-finetune"
...
Prepare the Data #
Following the Readme from mistral-finetune
, we’ll finetune the model with
UltraChat_200k for chat bot functionality.
cd $HOME && mkdir -p data && cd $HOME/data
Install dependencies for dataset modification:
pip install pandas pyarrow
In a Python shell, run
import pandas as pd
df = pd.read_parquet('<https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k/resolve/main/data/test_gen-00000-of-00001-3d4cd8309148a71f.parquet>')
# Split into train and eval
df_train=df.sample(frac=0.95,random_state=200)
df_eval=df.drop(df_train.index)
# Save to jsonl
df_train.to_json("ultrachat_chunk_train.jsonl", orient="records", lines=True)
df_eval.to_json("ultrachat_chunk_eval.jsonl", orient="records", lines=True)
Optionally, you can reformat and validate the data using the instructions in the Readme.
Configuration #
If you haven’t done yet, in the cloned repository’s root, run wisp init
to create the
configuration file. Open wisp-config.yml
and enter the following information:
setup:
project: local
remote: <https://github.com/mistralai/mistral-finetune>
script:
pip install -r requirements.txt
mkdir -p ~/${HOME}/mistral_models
cd ${HOME} && wget <https://models.mistralcdn.com/mistral-7b-v0-3/mistral-7B-v0.3.tar>
tar -xf mistral-7B-v0.3.tar -C mistral_models
io:
input:
volume: "~/data:~/data"
type: copy
output:
volume: "~/mistral-output:~/mistral-output"
type: copy
run:
cd $HOME/mistral-finetune
torchrun --nproc-per-node 8 --master_port $RANDOM -m train example/7B.yaml
resources:
accelerators:
name: H100
count: 8
Let’s go through this configuration from the beginning.
setup
#
Defines that the project
is local, meaning the whole working directory (excluding gitignore
files) will be
copied to the cloud instance when running. When setting up the node, the script
phase
will install requirements and download the model to the server.
io
#
The ìnput
step mounts the local ~/data
folder to the remote ~/data
folder using copy
.
That means that the data we downloaded and split up locally will be copied directly to
the node’s folder at the same directory.
Similarly, the output
step will download any outputs from the remote’s "~/mistral-output
folder to the local machine one the job is done.
You can also specify a remote data store, for example a s3 bucket, if you don’t want to download the model locally.
run
#
This step simply runs the training script.
resources
#
This step sets the constraints for the server. Mistral recommends using 8xH100 for finetuning (this should only take 30 minutes), but you can use any number of nodes, or even smaller GPU’s if you want.
Teardown #
Once done, you can tear down the instance with
wisp destroy
If you want to keep the persistent data to avoid setting up the instance next time you need to run it, simply run
wisp pause
Note that you will still be charged for some resources when pausing instances. The command above will give you a detailed cost breakdown before pausing.
You can see your job, stats and cost overview for your job in the dashboard under Jobs.