replicate/llama-7b | Run with an API on Replicate

Run time and cost

This model costs approximately $0.0086 to run on Replicate, or 116 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia A100 (80GB) GPU hardware. Predictions typically complete within 7 seconds.

Readme

Model description

LLaMA is a family of open-source large language models from Meta AI that perform as well as closed-source models. This is the 7B parameter version, available for both inference and fine-tuning.

Note: LLaMA is for research purposes only. It is not intended for commercial use.

Fine-tuning

If you have access to the training beta, you can fine-tune this model.

Here’s an example using replicate-python:

training = replicate.trainings.create(
    version="replicate/llama-7b:ac808388e2e9d8ed35a5bf2eaa7d83f0ad53f9e3df31a42e4eb0a0c3249b3165", 
    input={
        "train_data": "https://storage.googleapis.com/dan-scratch-public/fine-tuning/70k_samples.jsonl", 
    }, 
    destination="my-username/my-model"
)

Training takes these input parameters:

train_data (required): URL to a file where each row is a JSON record in the format {"prompt": ..., "completion": ...}. Can be JSONL or one JSON list.
train_batch_size (optional, default=4): Train batch size. Large batches = faster training, too large and you may run out of GPU memory.
gradient_accumulation_steps (optional, default=8): Number of training steps (each of train_batch_size) to update gradients for before performing a backward pass.
learning_rate (optional, default=2e-5): Learning rate!
num_train_epochs (optional, default=1): Number of epochs (iterations over the entire training dataset) to train for.
warmup_ratio (optional, default=0.03): Percentage of all training steps used for a linear LR warmup.
logging_steps (optional, default=1): Prints loss & other logging info every logging_steps.
max_steps (optional, default=-1): Maximum number of training steps. Unlimited if max_steps=-1.
lora_rank (optional, default=8): Rank of the LoRA matrices.
lora_alpha (optional, default=16): Alpha parameter for scaling LoRA weights; weights are scaled by alpha/rank
lora_dropout (optional, default=0.1): Dropout for LoRA training.
lora_target_modules (optional, default=’q_proj,v_proj’): Comma-separated list of target modules to fine-tune with LoRA.

As the input parameters suggest, this model is fine-tuned with LoRA using the peft library.

Citation

@misc{touvron2023llama,
      title={LLaMA: Open and Efficient Foundation Language Models}, 
      author={Hugo Touvron and Thibaut Lavril and Gautier Izacard and Xavier Martinet and Marie-Anne Lachaux and Timothée Lacroix and Baptiste Rozière and Naman Goyal and Eric Hambro and Faisal Azhar and Aurelien Rodriguez and Armand Joulin and Edouard Grave and Guillaume Lample},
      year={2023},
      eprint={2302.13971},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}