Table of contents
You can fine-tune language models to make them better at a particular task. With Replicate, you can fine-tune and run your model in the cloud without having to set up any GPUs.
You can train a language model to do things like:
These things are sometimes possible by creating prompts, but you can only pass a limited amount of data in the prompt.
In this guide, we’ll show you how to create a text summarizer. We'll be using Llama 2 7B, an open-source large language model from Meta and fine-tuning it on a dataset of messenger-like conversations with summaries. When we're done, you'll be able to distill chat transcripts, emails, webpages, and other documents into a brief summary. Short and sweet.
You can fine-tune many language models on Replicate, including:
To see all the language models that currently support fine-tuning, check out our collection of trainable language models.
If you're looking to fine-tune image models, check out our guide to fine-tuning image models.
Your training data should be in a single JSONL text file. JSONL (or "JSON lines") is a file format for storing structured data in a text-r8-based, line-delimited format. Each line in the file is a standalone JSON object.
If you’re building an instruction-tuned model like a chat bot that answers questions, structure your data using an object with a prompt
key and a completion
key on each line:
If you’re building an autocompleting model to do tasks like completing a user’s writing, code completion, finishing lists, few-shotting specific tasks like classification, or if you want more control over the format of your training data, structure each JSON line as a single object with a text
key and a string value:
Here are some existing datasets to give you a sense of what real training data looks like:
You don't need to prepare any of the datasets listed above, since they're already uploaded. However, for reference, here's how you can use the Hugging Face Datasets library to download the SAMSum dataset, prepare it, and save it as a JSONL file:
Note that we chose to prepare our text using a custom prompt template. The template should produce examples that look like this:
Note that when prompting a model at inference time, you'll need to use the same template that you used during training. That's the format the model understands. For example, to prompt the model to summarize the example above, you would use the following:
The model should then know to fill in the summary section with a summary of the conversation.
Training time varies depending on:
pack_sequences
input to true to speed up your training. This ‘packs together’ your examples to make full use of the available sequence length (~2048 tokens). This reduces the number of steps needed during training.In this guide, we’ll fine-tune Llama 2 7B, which uses an 8x Nvidia A40 GPU (large) instance for training, costing $0.348 per minute. Training for 3 epochs on the SAMSum dataset should take ~75 minutes, so it will total about $25.
Here are more example training times and costs for some of the datasets mentioned above:
Dataset | Dataset size | Training hardware | Training time | Cost |
---|---|---|---|---|
SAMSum | 11 MB (~3,500,000 tokens) | 8x Nvidia A40 (Large) | 75 minutes (3 epochs) | ~$26 |
Elixir docstrings | 1.5 MB (~450,000 tokens) | 8x Nvidia A40 (Large) | 17 minutes | ~$6 |
ABC notated music | 67 MB (~40,000,000 tokens) | 8x Nvidia A40 (Large) | 12.5 hours | ~$260 |
You need to create an empty model on Replicate for your trained model. When your training finishes, it will be pushed as a new version to this model.
Go to replicate.com/create and create a new model called “llama2-summarizer”. You probably want to make it private to start, and you have the option of making it public later.
Authenticate by setting your token in an environment variable:
If you're using the example SAMSum dataset, you can skip this section. It's already uploaded!
If you've created your own dataset, you'll need to upload it somewhere on the internet that is publicly accessible, like an S3 bucket or a GitHub Pages site.
If you like, you can use our API for uploading files. Run these commands to upload your data.jsonl
file:
To find out which models can be trained, check out the trainable language models collection.
Install the Python library:
Then, run this to create a training with meta/llama-2-7b as the base model:
It takes these arguments:
version
: The model to train, in the format USERNAME/MODEL:VERSION
.input
: The training data and params to pass to the training process, which are defined by the model. Llama 2's params can be found in the model's "Train" tab.destination
: The model to push the trained version to.Once you've kicked off your training, visit replicate.com/trainings in your browser to monitor the progress.
If you set a webhook in the previous step, you’ll receive a POST request at your webhook URL when the training completes.
If you're not using webhooks, you can poll for the status of the training job. When training.status
is succeeded
, then your model has been trained.
You can now run your model from the web or with an API. To use your model in the browser, go to your model page.
To use your model with an API, run the version
from the training output:
Let's prove our assumption that the fine-tuned model will be able to summarize better than the base model. We'll compare the results of the fine-tuned model to the base model on the same prompt (the one we show above in the "Run the model" section).
Model Name | Output |
---|---|
Fine-Tuned on SAMSum | Neville and Tina are on vacation. Neville has a suspicion that Tina is mad at him because it's their wedding anniversary. Neville's wedding anniversary is September 17. |
Base Llama-2-7b | Neville: Hey guys! Does anyone know when we got married? Don: You are kidding right? Neville: Nope, it is really important for some reason. Our anniversary is coming up soon, so I want to make sure I don't miss it. Wyatt: Hold on, let me ask my wife. Don: Ha ha, looks like he's going to get into big trouble :) Wyatt: It was September 17th. Just in case you forgot the year too... |