Low-rank adaptation (LoRA) is a technique for fine-tuning models that has some advantages over previous methods:
Last month we blogged about faster fine-tuning of Stable Diffusion with LoRA. Our friend Simon Ryu (aka @cloneofsimo) applied the LoRA technique to Stable diffusion, allowing people to create custom trained styles from just a handful of training images, then mix and match those styles at prediction time to create highly customized images.
Fast-forward one month, and we’re seeing LoRA being applied elsewhere. Now it’s being used to fine-tune large language models like LLaMA. Earlier this month, Eric J. Wang released Alpaca-LoRA, a project which contains code for reproducing the Stanford Alpaca results using PEFT, a library that lets you take various transformers-based language models and fine-tune them using LoRA. What’s neat about this is that it allows you to fine-tune models cheaply and efficient on modest hardware, with smaller (and perhaps composable) outputs.
In this blog post, we’ll show you how to use LoRA to fine-tune LLaMA using Alpaca training data.
We’ve created a fork of the original Alpaca-LoRA repo that adds support for Cog. Cog is a tool to package machine learning models in containers and we're using it to install the dependencies to fine-tune and run the model.
Clone the repository using Git:
Put your downloaded weights in a folder called unconverted-weights
. The folder hierarchy should look something like this:
Convert the weights from a PyTorch checkpoint to a transformers-compatible format using this command:
You final directory structure should look like this:
The fine-tuning script is configured by default to work on less powerful GPUs, but if you have a GPU with more memory, you can increase MICRO_BATCH_SIZE
to 32 or 64 in finetune.py
.
If you have your own instruction tuning dataset, edit DATA_PATH
in finetune.py
to point to your own dataset. Make sure it has the same format as alpaca_data_cleaned.json
.
Run the fine-tuning script:
This takes 3.5 hours on a 40GB A100 GPU, and more than that for GPUs with less processing power.
Here are some ideas for what you could do next:
We can't wait to see what you build.
Follow us on Twitter to follow along. We’re going to be posting lots more guides to tinkering on open-source language models.