replicate / llama-7b

Transformers implementation of the LLaMA language model

  • Public
  • 97.8K runs
  • GitHub
  • Paper
  • License

Input

Output

Run time and cost

This model runs on Nvidia A100 (40GB) GPU hardware. Predictions typically complete within 17 seconds. The predict time for this model varies significantly based on the inputs.

Readme

Model description

LLaMA is a family of open-source large language models from Meta AI that perform as well as closed-source models. This is the 7B parameter version, available for both inference and fine-tuning.

Note: LLaMA is for research purposes only. It is not intended for commercial use.

Fine-tuning

If you have access to the training beta, you can fine-tune this model.

Here’s an example using replicate-python:

training = replicate.trainings.create(
    version="replicate/llama-7b:ac808388e2e9d8ed35a5bf2eaa7d83f0ad53f9e3df31a42e4eb0a0c3249b3165", 
    input={
        "train_data": "https://storage.googleapis.com/dan-scratch-public/fine-tuning/70k_samples.jsonl", 
    }, 
    destination="my-username/my-model"
)

Training takes these input parameters:

  • train_data (required): URL to a file where each row is a JSON record in the format {"prompt": ..., "completion": ...}. Can be JSONL or one JSON list.
  • train_batch_size (optional, default=4): Train batch size. Large batches = faster training, too large and you may run out of GPU memory.
  • gradient_accumulation_steps (optional, default=8): Number of training steps (each of train_batch_size) to update gradients for before performing a backward pass.
  • learning_rate (optional, default=2e-5): Learning rate!
  • num_train_epochs (optional, default=1): Number of epochs (iterations over the entire training dataset) to train for.
  • warmup_ratio (optional, default=0.03): Percentage of all training steps used for a linear LR warmup.
  • logging_steps (optional, default=1): Prints loss & other logging info every logging_steps.
  • max_steps (optional, default=-1): Maximum number of training steps. Unlimited if max_steps=-1.
  • lora_rank (optional, default=8): Rank of the LoRA matrices.
  • lora_alpha (optional, default=16): Alpha parameter for scaling LoRA weights; weights are scaled by alpha/rank
  • lora_dropout (optional, default=0.1): Dropout for LoRA training.
  • lora_target_modules (optional, default=’q_proj,v_proj’): Comma-separated list of target modules to fine-tune with LoRA.

As the input parameters suggest, this model is fine-tuned with LoRA using the peft library.

Citation

@misc{touvron2023llama,
      title={LLaMA: Open and Efficient Foundation Language Models}, 
      author={Hugo Touvron and Thibaut Lavril and Gautier Izacard and Xavier Martinet and Marie-Anne Lachaux and Timothée Lacroix and Baptiste Rozière and Naman Goyal and Eric Hambro and Faisal Azhar and Aurelien Rodriguez and Armand Joulin and Edouard Grave and Guillaume Lample},
      year={2023},
      eprint={2302.13971},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}