Collections

WAN family of models

The Wan video family from Alibaba is one of the strongest open-source video model lineups available. The models are fast, capable, and competitive with many proprietary options.

Wan 2.7 — Latest generation

The newest Wan models use a 27 billion parameter Mixture-of-Experts architecture. They support text-to-video, image-to-video, video editing, and reference-to-video — all with native audio generation.

  • Wan 2.7 T2V — text-to-video with audio, up to 1080p, 2-15 seconds
  • Wan 2.7 I2V — image-to-video with first-and-last-frame control, clip continuation, and audio sync
  • Wan 2.7 R2V — reference-to-video for character consistency across scenes
  • Wan 2.7 VideoEdit — edit videos with natural language while preserving motion
  • Wan 2.7 Image Pro — text-to-image and multi-image editing with 4K output and thinking mode
  • Wan 2.7 Image — image generation and editing up to 2K

Wan 2.5 — Audio-visual generation

Wan 2.5 models generate video with synchronized audio in a single pass — dialogue, sound effects, and background music all at once.

Wan 2.2 — Fast and cheap

The 2.2 models are optimized by PrunaAI for speed and cost. A 5-second video takes about 39 seconds at 480p or 150 seconds at 720p.

Why Wan?

  • Open source — both the model weights and code are publicly available
  • Fast on Replicate — optimized for quick generation
  • Versatile — covers T2V, I2V, video editing, image generation, and reference-based generation
  • Native audio — Wan 2.5 and 2.7 generate synchronized audio without separate models
  • Competitive quality — produces detailed videos with real-world accuracy

Frequently asked questions

Which WAN models are the fastest for generating video?

If you need quick results, wan-video/wan-2.2-i2v-fast (image-to-video) and wan-video/wan-2.2-t2v-fast (text-to-video) are the speed-optimized models in the WAN collection.
For example, WAN 2.2 can generate a 5-second clip at 480p in about 39 seconds, or around 150 seconds at 720p. Runtime depends on resolution and clip length.

Which models balance quality and speed the best?

wan-video/wan-2.5-i2v-fast and wan-video/wan-2.5-t2v-fast are good middle-ground choices. They offer more complex motion, support up to 1080p output, and can include background audio or lip-sync, while still being relatively fast.
If you’re aiming for higher fidelity or more cinematic effects, these are better picks than the 2.2 fast variants.

What works best for animating still images?

For turning a single image into a short video, start with wan-video/wan-2.2-i2v-fast. It’s quick and designed for smooth camera motion around a static subject.
If you want higher resolution, added audio, or more cinematic movement, wan-video/wan-2.5-i2v-fast provides more control and polish.

What should I pick when generating video from text prompts?

For text-to-video generation, wan-video/wan-2.5-t2v-fast produces visually richer results than the 2.2 models.
Including motion cues in your prompt (like “overhead crane shot” or “slow dolly zoom”) helps guide the animation. If speed matters more than fidelity, wan-video/wan-2.2-t2v-fast is a great starting point.

How do the main WAN subtypes differ?

  • Text-to-video (T2V): Generates a video entirely from your text prompt.
  • Image-to-video (I2V): Animates a single image, optionally with an additional text prompt.
  • Fast vs standard: Fast variants focus on quick turnaround. Higher-resolution models (like WAN 2.5) offer more features, including audio.
  • Resolution tiers: WAN 2.2 supports 480p and 720p; WAN 2.5 supports up to 1080p.

What kinds of outputs can I expect from WAN models?

Most WAN models produce short MP4 clips (typically around 5–10 seconds).

  • I2V models animate your input image.
  • T2V models generate a scene from text alone.
  • Some WAN 2.5 variants support background audio and lip-sync.
    You can adjust clip length and resolution using the model’s input fields.

How can I publish my own WAN-style model on Replicate?

If you’ve fine-tuned or trained your own video model, you can package it with Cog and push it to Replicate.
You’ll define inputs like image, prompt, num_frames, and resolution, and choose how to share it or price usage.

Can I use WAN models for commercial work?

Some WAN models are open source, but not all. Always check the license on each model page before using the outputs in commercial or distributed projects.

How do I run a WAN model on Replicate?

  1. Select a model from the WAN collection.
  2. For T2V: enter your text prompt. For I2V: upload an image and optionally add a prompt.
  3. Set resolution (480p, 720p, or 1080p for some variants) and number of frames.
  4. Run the model and wait for the video to generate.
  5. Download the clip and use it in your project.

What should I keep in mind when using WAN models?

  • Use a clean, clear image for I2V — the input strongly affects the output.
  • Adding camera or movement cues to your prompt can improve results.
  • Longer clips or higher resolution will increase processing time.
  • These models are designed for short, visually rich animations — not long-form video.
  • If you need audio, pick a WAN 2.5 variant that supports it or add sound in post.