Table of contents
Large language models (LLMs) are deep learning models trained on text. They are used to predict, classify, and generate text, but they can do a lot more: language models provide the "text" side of text-to-image models, and some modern LLMs can use multimodal inputs like images.
In this guide we'll cover the basic concepts behind LLMs, how to use them, how to find the right model for your needs, and some advanced techniques for prompt engineering.
Language modeling is not a brand new discipline. Researchers and engineers have been modeling language for decades with statistical methods, hand-coded grammars, even regular expressions. But over the last few years we've seen the Bitter Lesson: the most effective improvements in AI come from general methods that scale well with computation.
The "large" in Large Language Model is not a specific amount of compute or data. It refers to a qualitative shift that comes when you train a model with enough layers, enough parameters, and enough data: bigger suddenly becomes better. Language models have gone from auto-suggesting the next word, to writing essays and code and poetry, powered by the application of data and compute.
These models are extremely expensive to train. The biggest companies in the world are competing to have the biggest GPU cluster and train models with the most parameters. But that doesn't mean you can't use them!
Companies like Meta, Google and Mistral have released not just the code, but the trained weights of their models FOR the community to build on. Training on web-scale datasets is the expensive part of the process. Using them to make predictions, or inference, requires much less computation. You can even run open models on your local computer, if you have the hardware and/or patience.
You can also call open models through Replicate's API. Run models like Meta's Llama, Google's Flan-T5, or Mistral's Mixtral in seconds with just an API key.