AI video is having its Stable Diffusion moment

Posted December 16, 2024 by

AI video used to not be very good:

Will Smith eating spaghetti, u/chaindrop, March 2023

Then, 10 months later, OpenAI announced Sora:

Creating video from text, OpenAI, February 2024

Sora reset expectations about what a video model could be. The output was high resolution, smooth, and coherent. The examples looked like real video. It felt like we’d jumped into the future.

The problem was, nobody could use it! It was just a preview.

This was like when OpenAI announced the DALL-E image generation model back in 2021. It was one of the most extraordinary pieces of software that had been seen for years, but nobody could use it.

This created all of this pent-up demand that led to Stable Diffusion, which we wrote about last year.

Now the same thing is happening with video. Sora made everyone realize what is possible.

There are lots of models that are as good as Sora now

Some are high-quality, some are fast, some focus on realism, and others focus on style and creativity.

Some are open source, and the community is modifying, optimizing, and building upon them. You can fine-tune them with new styles, objects and characters, and more.

ModelELO scoreSpeedDurationResolutionOpen Source
OpenAI Sora114740s5s720pNo
Minimax Video-0111013min5s720pNo
Tencent Hunyuan Video10718min5s720pYes
Genmo Mochi 110644min5s848 × 480Yes
Runway Gen3104820s5s720pNo
Haiper 2.010375min4 or 6s720pNo
Luma Ray102940s5s720pNo
Lightricks LTX-Video68010s3s864 × 480Yes

ELO ratings are from Artificial Analysis. Speed and duration are based on generation times for a five second 720p video, unless otherwise specified.

Most of these models are on Replicate. You can try them out in your browser and build with them using APIs. Here are the ones you should try:

Minimax Video-01

View Minimax Video-01 on Replicate

Video-01 (also known as Hailuo) is the best at realism and coherency. It is, in many ways, Sora quality. It's just as smooth, the subjects are coherent, and it's high resolution. It handles out-of-distribution subjects well. It doesn't have all the features that Sora has, though.

You can generate five second 720p videos with it, using a text description or an image as the starting frame. It is closed-source, and takes about three minutes to generate.

Run it on Replicate: minimax/video-01

Tencent Hunyuan Video

View Tencent Hunyuan Video on Replicate

HunyuanVideo is up there with Sora and Minimax's Video-01, and it's open-source!

Because it's open-source, you can do anything with it. You can fine-tune it, people have made video-to-video, it's much more configurable (resolution, duration, steps, guidance scale, and lots more). It can make five second 720p videos, as well as smaller, faster 540p ones. You can reduce the steps and resolution to try different things quickly.

The downside is it's slower than Video-01, but we're working on making it faster. We’ll open source the optimizations, of course.

Run it on Replicate: tencent/hunyuan-video

Luma Ray

View Luma Ray on Replicate

Luma Ray (also known as Dream Machine) is not as realistic as Minimax Video-01 or Hunyuan Video, but it's much faster and more creative. Released in June, it was one of the first of this new generation of capable video models.

It takes 40 seconds to generate a 5 second video at 720p resolution. It's got more tools for controlling the output than some of the other models:

  • Start and end frames
  • Interpolation between start and end videos
  • Looped videos

Ray 2 is coming soon.

Run it on Replicate: luma/ray

Haiper 2.0

View Haiper 2.0 on Replicate

Haiper 2.0 was released in October. It can generate four and six second 720p videos. Six second videos take about five minutes to generate. You can use text or images to generate videos at a variety of aspect ratios.

A 4K version is coming soon.

Run it on Replicate: haiper-ai/haiper-video-2

Genmo Mochi 1

View Genmo Mochi 1 on Replicate

Mochi 1 was the first high-quality open-source video model to be released. To begin with you needed 4xH100s to run it, but the community swiftly optimized it for a single 4090.

Run it on Replicate: genmoai/mochi-1

You can also fine-tune Mochi 1 on Replicate. Use genmoai/mochi-1-lora-trainer to train it and genmoai/mochi-1-lora to run your trained models.

Lightricks LTX-Video

View Lightricks LTX-Video on Replicate

LTX-Video is a low-memory open-source video model. It's so fast: it makes three second videos in just 10 seconds on an L40S GPU (compared with minutes on an H100 for other models).

While it’s super fast, you should expect the quality to be lower than other models.

Run it on Replicate: lightricks/ltx-video

There's more

There are a few more excellent models that aren't on Replicate yet:

And of course, we're all still waiting for Black Forest Labs (creators of FLUX) to release their hotly anticipated video model.

Follow us on X to stay up to speed.