Question 1

Which WAN models are the fastest for generating video?

Accepted Answer

If you need quick results, wan-video/wan-2.2-i2v-fast (image-to-video) and wan-video/wan-2.2-t2v-fast (text-to-video) are the speed-optimized models in the WAN collection.
For example, WAN 2.2 can generate a 5-second clip at 480p in about 39 seconds, or around 150 seconds at 720p. Runtime depends on resolution and clip length.

Question 2

Which models balance quality and speed the best?

Accepted Answer

wan-video/wan-2.5-i2v-fast and wan-video/wan-2.5-t2v-fast are good middle-ground choices. They offer more complex motion, support up to 1080p output, and can include background audio or lip-sync, while still being relatively fast.
If you’re aiming for higher fidelity or more cinematic effects, these are better picks than the 2.2 fast variants.

Question 3

What works best for animating still images?

Accepted Answer

For turning a single image into a short video, start with wan-video/wan-2.2-i2v-fast. It’s quick and designed for smooth camera motion around a static subject.
If you want higher resolution, added audio, or more cinematic movement, wan-video/wan-2.5-i2v-fast provides more control and polish.

Question 4

What should I pick when generating video from text prompts?

Accepted Answer

For text-to-video generation, wan-video/wan-2.5-t2v-fast produces visually richer results than the 2.2 models.
Including motion cues in your prompt (like “overhead crane shot” or “slow dolly zoom”) helps guide the animation. If speed matters more than fidelity, wan-video/wan-2.2-t2v-fast is a great starting point.

Question 5

How do the main WAN subtypes differ?

Accepted Answer

Text-to-video (T2V): Generates a video entirely from your text prompt. Image-to-video (I2V): Animates a single image, optionally with an additional text prompt. Fast vs standard: Fast variants focus on quick turnaround. Higher-resolution models (like WAN 2.5) offer more features, including audio. Resolution tiers: WAN 2.2 supports 480p and 720p; WAN 2.5 supports up to 1080p.

Question 6

What kinds of outputs can I expect from WAN models?

Accepted Answer

Most WAN models produce short MP4 clips (typically around 5–10 seconds). I2V models animate your input image. T2V models generate a scene from text alone. Some WAN 2.5 variants support background audio and lip-sync.\ You can adjust clip length and resolution using the model’s input fields.

Question 7

How can I publish my own WAN-style model on Replicate?

Accepted Answer

If you’ve fine-tuned or trained your own video model, you can package it with Cog and push it to Replicate.
You’ll define inputs like image, prompt, num_frames, and resolution, and choose how to share it or price usage.

Question 8

Can I use WAN models for commercial work?

Accepted Answer

Some WAN models are open source, but not all. Always check the license on each model page before using the outputs in commercial or distributed projects.

Question 9

How do I run a WAN model on Replicate?

Accepted Answer

Select a model from the WAN collection. For T2V: enter your text prompt. For I2V: upload an image and optionally add a prompt. Set resolution (480p, 720p, or 1080p for some variants) and number of frames. Run the model and wait for the video to generate. Download the clip and use it in your project.

Question 10

What should I keep in mind when using WAN models?

Accepted Answer

Use a clean, clear image for I2V — the input strongly affects the output. Adding camera or movement cues to your prompt can improve results. Longer clips or higher resolution will increase processing time. These models are designed for short, visually rich animations — not long-form video. If you need audio, pick a WAN 2.5 variant that supports it or add sound in post.

WAN family of models

Frequently asked questions