Table of contents
Once you know your use case, it's time to choose a model. There are a large and ever-growing number of models available, many of which are small modifications fine-tuned from the same foundation. This page will help you navigate the landscape of models and choose the right one for your needs.
The most well-known models are the proprietary models from the big research labs. These models are highly capable generalists and can be used for a wide variety of tasks. They're usually accessed through an API or a web interface. Models of this scale are mostly RL-tuned for safety, and base models are not exposed to the public.
The Llama family of models by Meta are popular foundation models and are the basis for many of the fine-tunes available today. The current generation, Llama 2, come in three sizes: 7, 13, and 70 billion parameters, and have a context window of 4,000 tokens. They perform well in reasoning and coding tasks. Meta has also released a chat version of the model, though many users have found it to be overly safety-tuned.
They are released under a custom license that requires potential users with "greater than 700 million monthly active users in the preceding calendar month" to request special permission from Meta.
The Mistral 7B model is a small but powerful model, outperforming other models of up to 13 billion parameters in standard English and code benchmarks. It has an 8K context window and is licensed under Apache 2.0. Mistral 7B is a great choice for local inference and other compute-limited tasks.
Google's Flan-T5 is a versatile model trained on instruction data, available in five different sizes. It’s particularly effective in tasks requiring comprehension and response to instructions, like classification, translation, and summarization. Flan-T5 is a good choice for fine-tuning on specific tasks.
Phind CodeLlama, a 34B parameter model, specializes in programming-related tasks, boasting a 73.8% pass rate on HumanEval. Its multilingual capabilities in programming languages make it an exceptional tool for code generation and understanding.
Mixtral 8x7B is a Sparse Mixture of Experts model praised for its speed and adaptability to a wide range of tasks. It matches or outperforms Llama 2 70B and GPT 3.5 on a variety of tasks, while being six times faster. It has a 32K token window and is also Apache 2.0 licensed.
LLaVA is a multimodal model built on top of LLaMA and GPT-4 generated visual instruction tuning data. It combines vision and language capabilities, nearing GPT-4 level in some domains.
Nous Hermes, a 13B parameter model, is fine-tuned on over 300,000 synthetic instructions generated by GPT-4. It is known for its longer responses and low hallucination rates. Comparable to GPT-3.5-turbo, it's suitable for complex language tasks and applications where balance between efficiency and performance is essential. The recently released Nous Hermes Mistral 7B brings this instruction tuning to the Mistral base model.