Language model roundup, April 2023
Posted by @joehoover, @mattt, and @zeke
One month ago, we blogged about innovation around LLaMA, an open-source language model from Meta Labs. We heard from users that they really wanted to see more of these kinds of posts.
So here we are, a month later, with another roundup of recent developments in the world of open-source language models.
Large language models are hot. Here’s what came out this week:
- StableLM – A new set of language models from Stability AI, the folks behind the Stable Diffusion image generation model. These models are trained on a new dataset that’s 3x the size of The Pile.
- Vicuna – An open-source chatbot trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT.
- GPT4All – Demo, data, and code to train open-source assistant-style large language model based on GPT-J and LLaMa.
These new models join existing language models on Replicate like FLAN-T5, GPT-J, and LLaMA. As we publish more of these language models to Replicate, we’ll keep adding them to our collection of language models.
You can fine-tune language models to make them better at a particular task:
- Fine-tune a language model with Replicate - With Replicate, you can fine-tune and run your model in the cloud without having to set up any GPUs. This guide shows you how to get started.
- Trainable language model - A collection of models on Replicate that you can fine-tune using our training API.
- MediaWiki to training data – A Node.js tool to convert Wikipedia articles into training data for fine-tuning language models on Replicate. This repository is a great example of how to fine-tune a language model using the new training operations in the
People are building tools to compare these language models:
- OpenPlayground - An LLM playground from GitHub’s former CEO that you can run on your laptop.
- AI Playground - Compare and tune AI language models side-by-side, share your results, and auto-generate code snippets for Next.js.
- ShareGPT – A lot of experimentation with language models happens in ChatGPT. ShareGPT makes it easy to share your wildest conversations with GPT-4 with a single click.
Is the singularity near? Hard to say. But when we let these large language models talk to themselves and interact with external systems, they sure do start to look resemble something like AGI. Here are a few projects that have emerged in recent weeks:
- Auto-GPT: An experiment in making GPT-4 autonomous. Just a month in, and this project already has 100K stars, thousands of commits, and hundreds of contributors — both a testament to the explosion of interest in this space, and a reminder of how quickly things can escalate in this new age.
- BabyAGI: An AI-powered task management system using OpenAI models as well as LLaMA.
- Teenage-AGI: Another autonomous system like Auto-GPT and BabyAGI that takes inspiration from the paper “Generative Agents: Interactive Simulacra of Human Behavior”.
Lately we’ve been at a loss for words to describe the feeling of this moment. So it feels appropriately ironic to ask GPT-4 to generate a haiku to come up with a few words for us.*
Swift thoughts intertwine, Silicon minds now awake, Boundless growth ignites.