replicate / llama-13b-lora

Transformers implementation of the LLaMA 13B language model

  • Public
  • 5K runs
  • GitHub
  • Paper

Input

Output

Run time and cost

This model runs on Nvidia A100 (40GB) GPU hardware. Predictions typically complete within 12 seconds. The predict time for this model varies significantly based on the inputs.

Readme

Model description

LLaMA is a family of open-source large language models from Meta AI that perform as well as closed-source models. This is the 13B parameter version, available for both inference and fine-tuning. Fine-tuning for this model is done with LoRA.

Note: LLaMA is for research purposes only. It is not intended for commercial use.

Fine-tuning

If you have access to the training beta, you can fine-tune this model.

Here’s an example using replicate-python:

training = replicate.trainings.create(
    version="replicate/llama-13b-lora:455d66312a66299fba685548fe24f66880f093007b927abd19f4356295f8577c", 
    input={
        "train_data": "https://storage.googleapis.com/dan-scratch-public/fine-tuning/70k_samples.jsonl", 
    }, 
    destination="my-username/my-model"
)

Training takes these input parameters:

  • train_data (required): URL to a file where each row is a JSON record in the format {"prompt": ..., "completion": ...}. Can be JSONL or one JSON list.
  • train_batch_size (optional, default=1): Train batch size. For llama-13B, we recommend keeping the batch size small and increasing gradient_accumulation_steps
  • gradient_accumulation_steps (optional, default=8): Number of training steps (each of train_batch_size) to store gradients for before performing an optimizer step.
  • learning_rate (optional, default=2e-5): Learning rate!
  • num_train_epochs (optional, default=1): Number of epochs (iterations over the entire training dataset) to train for.
  • warmup_ratio (optional, default=0.03): Percentage of all training steps used for a linear LR warmup.
  • logging_steps (optional, default=1): Prints loss & other logging info every logging_steps.
  • max_steps (optional, default=-1): Maximum number of training steps. Unlimited if max_steps=-1.
  • lora_rank (optional, default=8): Rank of the LoRA matrices.
  • lora_alpha (optional, default=16): Alpha parameter for scaling LoRA weights; weights are scaled by alpha/rank
  • lora_dropout (optional, default=0.1): Dropout for LoRA training.
  • lora_target_modules (optional, default=’q_proj,v_proj’): Comma-separated list of target modules to fine-tune with LoRA.

Citation

@misc{touvron2023llama,
      title={LLaMA: Open and Efficient Foundation Language Models}, 
      author={Hugo Touvron and Thibaut Lavril and Gautier Izacard and Xavier Martinet and Marie-Anne Lachaux and Timothée Lacroix and Baptiste Rozière and Naman Goyal and Eric Hambro and Faisal Azhar and Aurelien Rodriguez and Armand Joulin and Edouard Grave and Guillaume Lample},
      year={2023},
      eprint={2302.13971},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}