Make smooth AI generated videos with AnimateDiff and an interpolator

Posted by @fofr and @zsxkib

In this blog post we’ll show you how to combine AnimateDiff and the ST-MFNet frame interpolator to create smooth and realistic videos from a text prompt. You can also specify camera movements using new controls.

You’ll go from a text prompt to a video, to a high-framerate video.

Create animations with AnimateDiff

AnimateDiff is a model that enhances existing text-to-image models by adding a motion modeling module. The motion module is trained on video clips to capture realistic motion dynamics. It allows Stable Diffusion text-to-image models to create animated outputs, ranging from anime to realistic photographs.

You can try AnimateDiff on Replicate.

Control camera movement

LoRAs provide an efficient way to speed up the fine-tuning process of big models without using much memory. They are most well known for Stable Diffusion models, they are lightweight extensions to a model for a style or subject. The same concept can be applied to an AnimateDiff motion module.

The original AnimateDiff authors have trained 8 new LoRAs for specific camera movements:

  • Pan up
  • Pan down
  • Pan left
  • Pan right
  • Zoom in
  • Zoom out
  • Rotate clockwise
  • Rotate anti-clockwise

Using the Replicate hosted model you can use all of these, and choose how strong their affect will be (between 0 and 1). You can also combine multiple camera movements and strengths to create specific effects.

In this example we used the ‘toonyou_beta3’ model with a zoom-in strength of 1 (view and tweak these settings):

Interpolate videos with ST-MFNet

Interpolation adds extra frames to a video. This increases the frame rate and makes the video smoother.

ST-MFNet is a ‘spatio-temporal multi-flow network for frame interpolation’, which is a fancy way of saying it’s a machine learning model that generates extra frames for a video. It does this by studying the changes in space (position of objects) and time (from one frame to another). The “multi-flow” part means it’s considering multiple ways things can move or change from one frame to the next. ST-MFNet works very well with AnimateDiff videos.

You can take a 2 second, 16 frames-per-second (fps) AnimateDiff video and increase it to 32 or 64 fps using ST-MFNet:

You can also turn it into a slow-motion 4 second video:

In this video we used the ‘realisticVisionV20_v20’ model with a landscape prompt. We kept the prompt and seed the same but changed the camera movement each time, then interpolated the videos:

Use the API to create a workflow

You can use the Replicate API to combine multiple models into a workflow, taking the output of one model and using it as input to another model.

Python

import replicate

# Initialize the Replicate API with the token
replicate.init(api_token='YOUR_REPLICATE_API_TOKEN')

print("Using AnimateDiff to generate a video")
output = replicate.run(
    "zsxkib/animate-diff:269a616c8b0c2bbc12fc15fd51bb202b11e94ff0f7786c026aa905305c4ed9fb",
    input={"prompt": "a medium shot of a vibrant coral reef with a variety of marine life"}
)
video = output[0]
print(video)
# https://pbxt.replicate.delivery/HnKtEcfWIoTIby5mGUufWwrXfHZ5VLpAnIHERSrNuiVAzfqGB/0-amediumshotofa.mp4

print("Using ST-MFNet to interpolate the video")
videos = replicate.run(
    "zsxkib/st-mfnet:2ccdad61a6039a3733d1644d1b71ebf7d03531906007590b8cdd4b051e3fbcd7",
    input={"mp4": video, "keep_original_duration": True, "framerate_multiplier": 4},
)
video = list(videos_list)[-1]
print(video)
# https://pbxt.replicate.delivery/VgwJdbh4NTZKEZpAaDhbzni1DGxzXOrHrCz5clFXIIGXOyaE/tmpaz7xlcls0-amediumshotofa_2.mp4

Javascript

import Replicate from "replicate";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

console.log("Using AnimateDiff to generate a video");
const output = await replicate.run(
  "zsxkib/animate-diff:269a616c8b0c2bbc12fc15fd51bb202b11e94ff0f7786c026aa905305c4ed9fb",
  { input: { prompt: "a medium shot of a vibrant coral reef with a variety of marine life" } }
);

const video = output[0];
console.log(video);
// https://pbxt.replicate.delivery/HnKtEcfWIoTIby5mGUufWwrXfHZ5VLpAnIHERSrNuiVAzfqGB/0-amediumshotofa.mp4

console.log("Using ST-MFNet to interpolate the video");
const videos = await replicate.run(
  "zsxkib/st-mfnet:2ccdad61a6039a3733d1644d1b71ebf7d03531906007590b8cdd4b051e3fbcd7",
  {
    input: {
      mp4: video,
      keep_original_duration: true,
      framerate_multiplier: 4
    }
  }
);
console.log(videos[1]);
// https://pbxt.replicate.delivery/VgwJdbh4NTZKEZpAaDhbzni1DGxzXOrHrCz5clFXIIGXOyaE/tmpaz7xlcls0-amediumshotofa_2.mp4

CLI

You can also use the CLI for Replicate to create a workflow:

export REPLICATE_API_TOKEN="..."

replicate run zsxkib/st-mfnet --web \
    keep_original_duration=true \
    framerate_multiplier=4 \
    mp4="$(replicate run zsxkib/animate-diff \
                prompt="a medium shot of a vibrant coral reef with a variety of marine life" | \
             jq -r '.output | join("")')"
# Opens https://replicate.com/p/p2j74vlbv464cojdne6sol6gq4

Wrapping up

Have you used AnimateDiff and ST-MFNet to make a video? Great! We’d love to see it.

Share your videos with us on Discord or tweet them @replicate. Let’s see what you’ve got!