How to use Alpaca-LoRA to fine-tune a model like ChatGPT

Low-rank adaptation (LoRA) is a technique for fine-tuning models that has some advantages over previous methods:

It is faster and uses less memory, which means it can run on consumer hardware.
The output is much smaller (megabytes, not gigabytes).
You can combine multiple fine-tuned models together at runtime.

Last month we blogged about faster fine-tuning of Stable Diffusion with LoRA. Our friend Simon Ryu (aka @cloneofsimo) applied the LoRA technique to Stable diffusion, allowing people to create custom trained styles from just a handful of training images, then mix and match those styles at prediction time to create highly customized images.

Fast-forward one month, and we’re seeing LoRA being applied elsewhere. Now it’s being used to fine-tune large language models like LLaMA. Earlier this month, Eric J. Wang released Alpaca-LoRA, a project which contains code for reproducing the Stanford Alpaca results using PEFT, a library that lets you take various transformers-based language models and fine-tune them using LoRA. What’s neat about this is that it allows you to fine-tune models cheaply and efficient on modest hardware, with smaller (and perhaps composable) outputs.

In this blog post, we’ll show you how to use LoRA to fine-tune LLaMA using Alpaca training data.

Prerequisites

GPU machine. Thanks to LoRA you can do this on low-spec GPUs like an NVIDIA T4 or consumer GPUs like a 4090. If you don’t already have access to a machine with a GPU, check out our guide to getting a GPU machine.
LLaMA weights. The weights for LLaMA have not yet been released publicly. To apply for access, fill out this Meta Research form.

Step 1: Clone the Alpaca-LoRA repo

We’ve created a fork of the original Alpaca-LoRA repo that adds support for Cog. Cog is a tool to package machine learning models in containers and we’re using it to install the dependencies to fine-tune and run the model.

Clone the repository using Git:

git clone https://github.com/daanelson/alpaca-lora
cd alpaca-lora

Step 2: Install Cog

sudo curl -o /usr/local/bin/cog -L "https://github.com/replicate/cog/releases/latest/download/cog_$(uname -s)_$(uname -m)"
sudo chmod +x /usr/local/bin/cog

Step 3: Get LLaMA weights

Put your downloaded weights in a folder called unconverted-weights. The folder hierarchy should look something like this:

unconverted-weights
├── 7B
│   ├── checklist.chk
│   ├── consolidated.00.pth
│   └── params.json
├── tokenizer.model
└── tokenizer_checklist.chk

Convert the weights from a PyTorch checkpoint to a transformers-compatible format using this command:

cog run python -m transformers.models.llama.convert_llama_weights_to_hf \
  --input_dir unconverted-weights \
  --model_size 7B \
  --output_dir weights

You final directory structure should look like this:

weights
├── llama-7b
└── tokenizermdki

Step 4: Fine-tune the model

The fine-tuning script is configured by default to work on less powerful GPUs, but if you have a GPU with more memory, you can increase MICRO_BATCH_SIZE to 32 or 64 in finetune.py .

If you have your own instruction tuning dataset, edit DATA_PATH in finetune.py to point to your own dataset. Make sure it has the same format as alpaca_data_cleaned.json.

Run the fine-tuning script:

cog run python finetune.py

This takes 3.5 hours on a 40GB A100 GPU, and more than that for GPUs with less processing power.

Step 5: Run the model with Cog

$ cog predict -i prompt="Tell me something about alpacas."

Alpacas are domesticated animals from South America. They are closely related to llamas and guanacos and have a long, dense, woolly fleece that is used to make textiles. They are herd animals and live in small groups in the Andes mountains. They have a wide variety of sounds, including whistles, snorts, and barks. They are intelligent and social animals and can be trained to perform certain tasks.

Next steps

Here are some ideas for what you could do next:

Bring your own dataset and fine-tune your own LoRA, like Cabrita: A portuguese finetuned instruction LLaMA, or Fine-tune LLaMA to speak like Homer Simpson.
Push the model to Replicate to run it in the cloud. This is handy if you want an API to build interfaces, or to run large-scale evaluation in parallel. You’ll need to keep it private so the weights aren’t public.
Combine LoRAs. It is possible to combine different Stable Diffusion LoRAs to have a fine-tuned style and fine-tuned object in the same image. What could be possible if this was done with language models?
Fine-tune the larger LLaMA models with the Alpaca dataset (or other datasets) and see how they perform. This should be possible with PEFT and LoRA, although it will need larger GPUs.

We can’t wait to see what you build.

Follow us on Twitter to follow along. We’re going to be posting lots more guides to tinkering on open-source language models.