meta / llama-2-70b-chat

A 70 billion parameter language model from Meta, fine tuned for chat completions

Language model training is a beta feature.
We’re still working out the kinks. If you run into any issues, please hop in our Discord and let us know. Keep in mind that we might make breaking changes to the API as we improve the training experience.

If you haven’t yet trained a model on Replicate, we recommend you read one of the following guides.


Trainings for this model run on 4x Nvidia A100 (80GB) GPU hardware, which costs $0.0056 per second.

Create a training

Install the Python library:

pip install replicate

Then, run this to create a training with replicate-internal/llama-2-70b-chat-int8-4xa100-80gb-triton:a2814fa5 as the base model:

import replicate

training = replicate.trainings.create(

curl -s -X POST \
-d '{"destination": "{username}/<destination-model-name>", "input": {...}}' \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \

The API response will look like this:

  "id": "zz4ibbonubfz7carwiefibzgga",
  "version": "a2814fa5c8f04cf965ffad6f03532c151cd1fab6311141b72fc6f17e1ced72bf",
  "status": "starting",
  "input": {
    "data": "..."
  "output": null,
  "error": null,
  "logs": null,
  "started_at": null,
  "created_at": "2023-03-28T21:47:58.566434Z",
  "completed_at": null

Note that before you can create a training, you’ll need to create a model and use its name as the value for the destination field.

Training inputs

  • train_data (required): URL to a file of training data where each row is a JSON record in the format {"text": ...} or {"prompt": ..., "completion": ...}. Must be JSONL.

  • num_train_epochs (optional, default=3): Number of epochs (iterations over the entire training dataset) to train for.

  • train_batch_size (optional, default=4): Global batch size. This specifies the batch size that will be used to calculate gradients. Optimal batch size is data dependent; larger sizes train faster but may cause OOMs. 8 often works well for this configuration of llama-2-7B.

  • micro_batch_size (optional, default=4): Micro batch size. This specifies the on-device batch size, if this is less than train_batch_size, gradient accumulation will be activated.

  • num_validation_samples (optional, default=50): Number of samples to use for validation. If run_validation is True and validation_data is not specified, this number of samples will be selected from the tail of the training data. If validation_data is specified, this number of samples will be selected from the head of the validation data, up to the size of the validation data.

  • validation_data (optional): URL to a file of eval data where each row is a JSON record in the format {"text": ...} or {"prompt": ..., "completion": ...} or {"prompt": ..., "completion": ...}. Must be JSONL.

  • validation_batch_size (optional, default=1): Batch size for evaluation. For small validation sets, you should use the default batch size of 1.

  • run_validation (optional, default=True): Whether to run validation during training.

  • validation_prompt (optional, default=None): If provided, this prompt will be used to generate a model response during each validation step. Must be a string formatted prompt. Note: this is not implemented for QLoRA training.

  • learning_rate (optional, default=1e-4): Learning rate!

  • pack_sequences (optional, default=False): If ‘True’, sequences will be packed into a single sequences up to a given length of chunk_size. This improves computational efficiency.

  • wrap_packed_sequences (optional, default=False): If ‘pack_sequences’ is ‘True’, this will wrap packed sequences across examples, ensuring a constant sequence length but breaking prompt formatting.

  • chunk_size (optional, default=2048): If ‘pack_sequences’ is ‘True’, this will chunk sequences into chunks of this size.

  • peft_method (optional, default=’lora’): Training method to use. Currently, only ‘lora’ and ‘qlora’ are supported.

  • seed (optional, default=42): Random seed to use for training.

  • lora_rank (optional, default=8): Rank of the LoRA matrices.

  • lora_alpha (optional, default=16): Alpha parameter for scaling LoRA weights; weights are scaled by alpha/rank

  • lora_dropout (optional, default=0.05): Dropout for LoRA training.

Please see for more information about the model, licensing, and acceptable use.