Machine learning needs better tools
Posted by @bfirsh
Machine learning used to be an academic pursuit. If you wanted to work on it, you probably needed to be part of a lab or have a PhD.
In early 2021, there was a shift. Advadnoun released a Colab notebook called The Big Sleep. RiversHaveWings followed up with the VQGAN+CLIP notebook. These notebooks turned text descriptions into images by guiding a GAN with CLIP.
These projects weren’t increasing some accuracy metric by a fraction of a percent. They weren’t published as academic papers. They were just pieces of software that did something neat.
People made copies of these notebooks and built upon them. We got pixray, dalle-mini, Disco Diffusion. It felt like something new was happening every week.
These people were not affiliated with a lab. They were often self-taught, just tinkering in their spare time. Or they were software engineers, cobbling together bits of machine learning code.
Part of what made this all possible was pre-trained foundation models. Instead of having to train a model from scratch at great expense, individuals could pick models off-the-shelf and combine them in interesting ways. Kind of like how you can import a few packages from npm and plug them together to make something new.
It all felt a lot more like open-source software than machine learning research.
Then, Stable Diffusion came along.
DALL-E 2, a text-to-image model of similar quality, was released a few months earlier but it was closed source and behind a private beta. We’re used to advances in normal software being open source and on GitHub. DALL-E was the most extraordinary piece of software that had been seen for years, and software engineers couldn’t use it.
That’s what created the fertile ground for Stable Diffusion. Stable Diffusion was open-source, and it was much better quality than the previous generation of open-source text-to-image models.
It caught the imagination of hackers and there was an explosion of forks: inpainting, animation, texture generation, fine-tuning, and so on.
These open-source image generation models were suddenly good enough to be useful. People were building a ton of stuff. There were mobile apps to generate images from text, apps for creating avatars of yourself from just a few training images, and procedurally generated games.
There’s a catch, though.
Machine learning is too hard to use
If you try to actually build something with these machine learning models, you find that none of it really works. You spend all day battling with messy Python scripts, broken Colab notebooks, perplexing CUDA errors, misshapen tensors. It’s a mess.
Normal software used to be like this. If you wanted to build a website 20 years ago it felt like trying to use machine learning today. You had to build web servers, authentication systems, user interface components. You were concatenating HTML and SQL by hand, hoping you didn’t get owned. To deploy, you uploaded files to an FTP server and waited and hoped for the best.
But then we got Ruby on Rails. And Django. And React. And Heroku. And Next.js. And Vercel. And, all the rest. Tools that made software development easier, and more fun.
In his Software 2.0 essay, Andrej Karpathy introduced the idea that deep learning is becoming sufficiently advanced that it is replacing large parts of normal software. But he also said it’s a weird kind of software that requires a new set of tools to use. He made the case that we need a new tooling stack:
The lens through which we view trends matters. If you recognize Software 2.0 as a new and emerging programming paradigm instead of simply treating neural networks as a pretty good classifier in the class of machine learning techniques, the extrapolations become more obvious, and it’s clear that there is much more work to do. … Who is going to develop the first Software 2.0 IDEs, which help with all of the workflows in accumulating, visualizing, cleaning, labeling, and sourcing datasets? … Traditional package managers and related serving infrastructure like pip, conda, docker, etc. help us more easily deploy and compose binaries. How do we effectively deploy, share, import and work with Software 2.0 binaries?
The reason machine learning is so hard to use is not because it’s inherently hard. We just don’t have good tools and abstractions yet. You shouldn’t have to understand GPUs to use machine learning, in the same way you don’t have to understand TCP/IP to build a website.
Machine learning was hard at Spotify
Andreas is an old friend of mine. He was a machine learning engineer at Spotify. He was a kinda weird machine learning engineer in that he did everything from fundamental research all the way down to building one of the main deep learning pipelines at Spotify. Everything from math, to products, to servers.
My background is in developer tools, most recently at Docker where I created Docker Compose. I also founded a few now-defunct developer tools startups, accidentally wrote a book about command-line interfaces, and was a misuser of web browsers a long time ago.
Andreas was telling me about a cluster of related problems at Spotify:
- It was hard to run open-source machine learning models. All these advances were locked up inside prose in PDFs, scraps of code on GitHub, weights on Google Drive (if you were lucky!). If you wanted to build upon this research, or apply it to real-world problems, you had to implement it all from scratch.
- It was hard to deploy machine learning models to production. Typically a researcher would have to sit down with an engineer to decide on an API, get a server written, package up dependencies, battle CUDA, get it running efficiently, and so on and so forth. It would take weeks to get something running in production.
Through my Docker eyes, these things Andreas was describing were Docker-shaped problems.
If only we could define a standard box for machine learning models, then researchers could put their models inside it. They could be shared with other people. They could be deployed to production. They would run anywhere, and always keep on running.
Docker for machine learning
We’ve created a standard box. Cog is Docker for machine learning. It makes it easy to package a machine learning model inside a container so that you can share it and deploy it to production.
Replicate is a place to put these models. We have an expansive library of open source models that researchers and hackers have pushed.
If you want to build things with these models, you can run them in the cloud with a few lines of code:
import replicate
replicate.run(
"stability-ai/stable-diffusion:db21e45d3f7023abc2a46ee38a23973f6dce16bb082a930b0c49861f96d1e5bf",
input={"prompt": "an astronaut riding on a horse"},
)
They’re also all available as Docker images with a standard API.
If you’re building your own machine learning model, you can also deploy it on Replicate. We’ll automatically generate an API server and deploy it on a big cluster of GPUs.
Machine learning is just software
Lots of people want to build things with machine learning, but they don’t have the expertise to use it. It’s not technology holding back adoption of machine learning, it’s the fact that you need all this specialist knowledge to use it.
There are roughly two orders of magnitude more software engineers than there are machine learning engineers (~30 million vs. ~500,000). By building good tools, we think it is possible for software engineers to use machine learning in the same way they can use normal software.
You should be able to import an audio transcriber the same way you an import an npm package. You should be able to fine-tune GPT as easily as you can fork something on GitHub.
Twenty years ago people might have said they were going to build an “internet application”. We don’t say that any longer. The internet is just the way that things are done.
Soon enough, that’s what machine learning will be like. It’ll just be how software is done. There’ll be a sprinkling of intelligence in everything.
And it’s going to be built by software engineers.