In this blog post we’ll show you how to combine AnimateDiff and the ST-MFNet frame interpolator to create smooth and realistic videos from a text prompt. You can also specify camera movements using new controls.
You’ll go from a text prompt to a video, to a high-framerate video.
AnimateDiff is a model that enhances existing text-to-image models by adding a motion modeling module. The motion module is trained on video clips to capture realistic motion dynamics. It allows Stable Diffusion text-to-image models to create animated outputs, ranging from anime to realistic photographs.
You can try AnimateDiff on Replicate.
LoRAs provide an efficient way to speed up the fine-tuning process of big models without using much memory. They are most well known for Stable Diffusion models, they are lightweight extensions to a model for a style or subject. The same concept can be applied to an AnimateDiff motion module.
The original AnimateDiff authors have trained 8 new LoRAs for specific camera movements:
Using the Replicate hosted model you can use all of these, and choose how strong their affect will be (between 0 and 1). You can also combine multiple camera movements and strengths to create specific effects.
In this example we used the 'toonyou_beta3' model with a zoom-in strength of 1 (view and tweak these settings):
Interpolation adds extra frames to a video. This increases the frame rate and makes the video smoother.
ST-MFNet is a ‘spatio-temporal multi-flow network for frame interpolation’, which is a fancy way of saying it's a machine learning model that generates extra frames for a video. It does this by studying the changes in space (position of objects) and time (from one frame to another). The "multi-flow" part means it's considering multiple ways things can move or change from one frame to the next. ST-MFNet works very well with AnimateDiff videos.
You can take a 2 second, 16 frames-per-second (fps) AnimateDiff video and increase it to 32 or 64 fps using ST-MFNet:
You can also turn it into a slow-motion 4 second video:
In this video we used the 'realisticVisionV20_v20' model with a landscape prompt. We kept the prompt and seed the same but changed the camera movement each time, then interpolated the videos:
You can use the Replicate API to combine multiple models into a workflow, taking the output of one model and using it as input to another model.
You can also use the CLI for Replicate to create a workflow:
Have you used AnimateDiff and ST-MFNet to make a video? Great! We'd love to see it.
Share your videos with us on Discord or tweet them @replicate. Let's see what you've got!