Collections

WAN family of models

If you've been following the AI video space lately, you've probably noticed that it's exploding. New models are coming out every week with better outputs, higher resolution, and faster generation speeds.

Wan2.2 is the newest and most capable open-source video model. It was released this week, and it's topping the leaderboards.

There's a lot to like about Wan2.2:

  • It's fast on Replicate. A 5s video takes 39s at 480p, or 150s at 720p.
  • It's open source, both the model weights and the code. The community is already building tools to enhance it.
  • It produces stunning videos with real-world accuracy.
  • It's small enough to run on consumer GPUs.

Frequently asked questions

Which WAN models are the fastest for generating video?

If you need quick results, wan-video/wan-2.2-i2v-fast (image-to-video) and wan-video/wan-2.2-t2v-fast (text-to-video) are the speed-optimized models in the WAN collection.
For example, WAN 2.2 can generate a 5-second clip at 480p in about 39 seconds, or around 150 seconds at 720p. Runtime depends on resolution and clip length.

Which models balance quality and speed the best?

wan-video/wan-2.5-i2v-fast and wan-video/wan-2.5-t2v-fast are good middle-ground choices. They offer more complex motion, support up to 1080p output, and can include background audio or lip-sync, while still being relatively fast.
If you’re aiming for higher fidelity or more cinematic effects, these are better picks than the 2.2 fast variants.

What works best for animating still images?

For turning a single image into a short video, start with wan-video/wan-2.2-i2v-fast. It’s quick and designed for smooth camera motion around a static subject.
If you want higher resolution, added audio, or more cinematic movement, wan-video/wan-2.5-i2v-fast provides more control and polish.

What should I pick when generating video from text prompts?

For text-to-video generation, wan-video/wan-2.5-t2v-fast produces visually richer results than the 2.2 models.
Including motion cues in your prompt (like “overhead crane shot” or “slow dolly zoom”) helps guide the animation. If speed matters more than fidelity, wan-video/wan-2.2-t2v-fast is a great starting point.

How do the main WAN subtypes differ?

  • Text-to-video (T2V): Generates a video entirely from your text prompt.
  • Image-to-video (I2V): Animates a single image, optionally with an additional text prompt.
  • Fast vs standard: Fast variants focus on quick turnaround. Higher-resolution models (like WAN 2.5) offer more features, including audio.
  • Resolution tiers: WAN 2.2 supports 480p and 720p; WAN 2.5 supports up to 1080p.

What kinds of outputs can I expect from WAN models?

Most WAN models produce short MP4 clips (typically around 5–10 seconds).

  • I2V models animate your input image.
  • T2V models generate a scene from text alone.
  • Some WAN 2.5 variants support background audio and lip-sync.
    You can adjust clip length and resolution using the model’s input fields.

How can I publish my own WAN-style model on Replicate?

If you’ve fine-tuned or trained your own video model, you can package it with Cog and push it to Replicate.
You’ll define inputs like image, prompt, num_frames, and resolution, and choose how to share it or price usage.

Can I use WAN models for commercial work?

Some WAN models are open source, but not all. Always check the license on each model page before using the outputs in commercial or distributed projects.

How do I run a WAN model on Replicate?

  1. Select a model from the WAN collection.
  2. For T2V: enter your text prompt. For I2V: upload an image and optionally add a prompt.
  3. Set resolution (480p, 720p, or 1080p for some variants) and number of frames.
  4. Run the model and wait for the video to generate.
  5. Download the clip and use it in your project.

What should I keep in mind when using WAN models?

  • Use a clean, clear image for I2V — the input strongly affects the output.
  • Adding camera or movement cues to your prompt can improve results.
  • Longer clips or higher resolution will increase processing time.
  • These models are designed for short, visually rich animations — not long-form video.
  • If you need audio, pick a WAN 2.5 variant that supports it or add sound in post.