What’s the difference between Llama 2 7B, 13B, and 70B?

Posted by @zeke

Llama 2 is a new open-source language model from Meta AI that outperforms other open-source language models on many benchmarks, including reasoning, coding, proficiency, and knowledge tests. We built an app called LLM Boxing that pits Llama 2 against OpenAI’s GPT-3.5 Turbo in a human-evaluated competition, and Llama 2 is winning by a significant margin.

You can run Llama 2 on Replicate with one line of code, but which Llama model should you use? It was released just a few weeks ago, but there’s already a proliferation of derivative models. In this post, we’ll break down the differences between the models and help you choose the right one for your use case.

Chat models

The most popular Llama models released by Meta AI are the so-called “chat models”. These models are fine-tuned on publicly available instruction datasets and over 1 million human annotations.

The following chat models are supported and maintained by Replicate:

  • meta/llama-2-70b-chat: 70 billion parameter model fine-tuned on chat completions. If you want to build a chat bot with the best accuracy, this is the one to use.
  • meta/llama-2-13b-chat: 13 billion parameter model fine-tuned on chat completions. Use this if you’re building a chat bot and would prefer it to be faster and cheaper at the expense of accuracy.
  • meta/llama-2-7b-chat: 7 billion parameter model fine-tuned on chat completions. This is an even smaller, faster model.

What’s a16z-infra, you ask? Andreesen Horowitz (also known by their a16z numeronym) is a venture capital firm that invests in technology companies, including Replicate. a16z is also one of Meta AI’s global partners and supporters. They were quick to recognize the potential of Llama 2, and instrumental in helping us bring Llama 2 models to Replicate.

You can compare the performance and outputs of the Llama chat models using Vercel’s AI Playground:

Base models

In addition to the instruction-tuned chat models, Meta AI also released a set of base models. Use these models if you want to do other kinds of language tasks, like completing a user’s writing, code completion, finishing lists, or few-shotting specific tasks like classification:

What’s next?

Llama 2 innovation is picking up speed, and we expect to release and support more Llama-based models in the coming weeks.

Happy hacking! 🦙