Popular open source language models and their use cases

Once you know your use case, it’s time to choose a model. There are a large and ever-growing number of models available, many of which are small modifications fine-tuned from the same foundation. This page will help you navigate the landscape of models and choose the right one for your needs.

The most well-known models are the proprietary models from the big research labs. These models are highly capable generalists and can be used for a wide variety of tasks. They’re usually accessed through an API or a web interface. Models of this scale are mostly RL-tuned for safety, and base models are not exposed to the public.

  • GPT-4, by OpenAI, is the most powerful model currently available. It is better at coding than the other models in this category, and can use tool functions through the API. OpenAI hosts ChatGPT, a web interface that also integrates a code interpreter, web search, and image generator. GPT-4 is only available to paid subscribers.
  • GPT-3.5, the free version of ChatGPT, is faster and cheaper than GPT-4 but much less capable. Open source models have caught up to GPT-3.5 in many areas, and may soon surpass it.
  • Claude, by Anthropic, is a powerful model which is especially good at writing prose. It is free in open beta through a web interface, but the API is still in limited release to a small number of users.
  • Bard, by Google, is a web interface that uses PALM and Gemini models under the hood. It is not as strong as GPT-4 or Claude (yet), but it is free to use and can integrate with Google services.

The Llama family of models by Meta are popular foundation models and are the basis for many of the fine-tunes available today. The current generation, Llama 2, come in three sizes: 7, 13, and 70 billion parameters, and have a context window of 4,000 tokens. They perform well in reasoning and coding tasks. Meta has also released a chat version of the model, though many users have found it to be overly safety-tuned.

They are released under a custom license that requires potential users with “greater than 700 million monthly active users in the preceding calendar month” to request special permission from Meta.

The Mistral 7B model is a small but powerful model, outperforming other models of up to 13 billion parameters in standard English and code benchmarks. It has an 8K context window and is licensed under Apache 2.0. Mistral 7B is a great choice for local inference and other compute-limited tasks.

Google’s Flan-T5 is a versatile model trained on instruction data, available in five different sizes. It’s particularly effective in tasks requiring comprehension and response to instructions, like classification, translation, and summarization. Flan-T5 is a good choice for fine-tuning on specific tasks.

Phind CodeLlama, a 34B parameter model, specializes in programming-related tasks, boasting a 73.8% pass rate on HumanEval. Its multilingual capabilities in programming languages make it an exceptional tool for code generation and understanding.

Mixtral 8x7B is a Sparse Mixture of Experts model praised for its speed and adaptability to a wide range of tasks. It matches or outperforms Llama 2 70B and GPT 3.5 on a variety of tasks, while being six times faster. It has a 32K token window and is also Apache 2.0 licensed.

LLaVA is a multimodal model built on top of LLaMA and GPT-4 generated visual instruction tuning data. It combines vision and language capabilities, nearing GPT-4 level in some domains.

Nous Hermes, a 13B parameter model, is fine-tuned on over 300,000 synthetic instructions generated by GPT-4. It is known for its longer responses and low hallucination rates. Comparable to GPT-3.5-turbo, it’s suitable for complex language tasks and applications where balance between efficiency and performance is essential. The recently released Nous Hermes Mistral 7B brings this instruction tuning to the Mistral base model.