Llama 2 is a new open-source language model from Meta AI that outperforms other open-source language models on many benchmarks, including reasoning, coding, proficiency, and knowledge tests. We built an app called LLM Boxing that pits Llama 2 against OpenAI's GPT-3.5 Turbo in a human-evaluated competition, and Llama 2 is winning by a significant margin.
You can run Llama 2 on Replicate with one line of code, but which Llama model should you use? It was released just a few weeks ago, but there's already a proliferation of derivative models. In this post, we'll break down the differences between the models and help you choose the right one for your use case.
The most popular Llama models released by Meta AI are the so-called "chat models". These models are fine-tuned on publicly available instruction datasets and over 1 million human annotations.
The following chat models are supported and maintained by Replicate:
What's a16z-infra, you ask? Andreesen Horowitz (also known by their a16z numeronym) is a venture capital firm that invests in technology companies, including Replicate. a16z is also one of Meta AI's global partners and supporters. They were quick to recognize the potential of Llama 2, and instrumental in helping us bring Llama 2 models to Replicate.
You can compare the performance and outputs of the Llama chat models using Vercel's AI Playground:
In addition to the instruction-tuned chat models, Meta AI also released a set of base models. Use these models if you want to do other kinds of language tasks, like completing a user’s writing, code completion, finishing lists, or few-shotting specific tasks like classification:
Llama 2 innovation is picking up speed, and we expect to release and support more Llama-based models in the coming weeks.
Happy hacking! 🦙