replicate / llama-7b

Transformers implementation of the LLaMA language model

  • Public
  • 99.1K runs
  • A100 (80GB)
  • GitHub
  • Paper
  • License

Input

*string
Shift + Return to add a new line

Text to prefix with 'hello '

number

Top p value

Default: 0.95

number

Temperature

Default: 0.8

integer

Max generation length

Default: 256

Output

all inertial observers are equivalent. To put it another way, if an object moves at the same speed as something else, and in the same direction as that something else, then from an inertial perspective, both objects will have the same mass, and both will be moving at the same speed. The only difference is that the object in question will have a different time dilation factor. In the case of the two trains, they both have different time dilation factors. This is because the train is moving faster than the ground, and thus has a higher time dilation factor. The train moving slower than the ground has a lower time dilation factor. There are, of course, a few caveats. One is that we cannot have relative velocities greater than the speed of light. The other is that the time dilation factor also depends on the direction of travel. That is, if the train is moving towards the ground, then its time dilation factor will be lower than if the train is moving away from the ground. But these caveats do not have any real implications for the example provided in the question. The two trains are traveling in the same direction, and thus have a time dilation factor of $\gamma = 0.97$ and $\gamma = 0.96$ respectively. They both have the same mass, but we can ignore this for the sake of simplicity. The only thing that we can say is that the two trains are moving at a relative velocity of $0.96c$. The question itself states that the two trains are traveling at $0.98c$. However, we know that the two trains will experience time dilation factors of $\gamma = 0.97$ and $\gamma = 0.96$ respectively. The difference between the two trains' velocity is only $0.02c$. Since $0.02c < 0.05c$, then there is no way that this difference in speed will cause one of the two trains to be stationary. That is, inertia will not allow that to happen. The only difference between the two trains is their time dilation factors. The only difference between the two trains is their time dilation factor, and the time dilation factor of the ground. The only
Generated in

This output was created using a different version of the model, replicate/llama-7b:455d6631.

Run time and cost

This model costs approximately $0.0086 to run on Replicate, or 116 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia A100 (80GB) GPU hardware. Predictions typically complete within 7 seconds.

Readme

Model description

LLaMA is a family of open-source large language models from Meta AI that perform as well as closed-source models. This is the 7B parameter version, available for both inference and fine-tuning.

Note: LLaMA is for research purposes only. It is not intended for commercial use.

Fine-tuning

If you have access to the training beta, you can fine-tune this model.

Here’s an example using replicate-python:

training = replicate.trainings.create(
    version="replicate/llama-7b:ac808388e2e9d8ed35a5bf2eaa7d83f0ad53f9e3df31a42e4eb0a0c3249b3165", 
    input={
        "train_data": "https://storage.googleapis.com/dan-scratch-public/fine-tuning/70k_samples.jsonl", 
    }, 
    destination="my-username/my-model"
)

Training takes these input parameters:

  • train_data (required): URL to a file where each row is a JSON record in the format {"prompt": ..., "completion": ...}. Can be JSONL or one JSON list.
  • train_batch_size (optional, default=4): Train batch size. Large batches = faster training, too large and you may run out of GPU memory.
  • gradient_accumulation_steps (optional, default=8): Number of training steps (each of train_batch_size) to update gradients for before performing a backward pass.
  • learning_rate (optional, default=2e-5): Learning rate!
  • num_train_epochs (optional, default=1): Number of epochs (iterations over the entire training dataset) to train for.
  • warmup_ratio (optional, default=0.03): Percentage of all training steps used for a linear LR warmup.
  • logging_steps (optional, default=1): Prints loss & other logging info every logging_steps.
  • max_steps (optional, default=-1): Maximum number of training steps. Unlimited if max_steps=-1.
  • lora_rank (optional, default=8): Rank of the LoRA matrices.
  • lora_alpha (optional, default=16): Alpha parameter for scaling LoRA weights; weights are scaled by alpha/rank
  • lora_dropout (optional, default=0.1): Dropout for LoRA training.
  • lora_target_modules (optional, default=’q_proj,v_proj’): Comma-separated list of target modules to fine-tune with LoRA.

As the input parameters suggest, this model is fine-tuned with LoRA using the peft library.

Citation

@misc{touvron2023llama,
      title={LLaMA: Open and Efficient Foundation Language Models}, 
      author={Hugo Touvron and Thibaut Lavril and Gautier Izacard and Xavier Martinet and Marie-Anne Lachaux and Timothée Lacroix and Baptiste Rozière and Naman Goyal and Eric Hambro and Faisal Azhar and Aurelien Rodriguez and Armand Joulin and Edouard Grave and Guillaume Lample},
      year={2023},
      eprint={2302.13971},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}