Readme

Wan 2.7 Image-to-Video

Wan 2.7 I2V is an image-to-video generation model from Alibaba’s Wan family. Give it a still image and a text prompt, and it generates a video that brings the image to life — with camera motion, character animation, environmental effects, and more.

How it works

The model takes a starting image and uses it as the first frame of a generated video. A text prompt guides the motion and action. You can also provide a last frame to control where the video ends up, continue from an existing video clip, or supply an audio track for synchronized generation.

This makes it useful for animating illustrations, product shots, concept art, or any still image where you want to add realistic motion without creating footage from scratch.

Generation modes

Image to video — provide a first frame image and a prompt to animate it
First-and-last-frame — provide both a first and last frame to control the start and end of the video
Clip continuation — provide an existing video clip to extend it with new generated frames
Audio-synchronized — provide an audio file to generate video that matches the audio

Inputs

first_frame — Starting image to animate into video (jpg/png/bmp/webp, ≤20MB)
last_frame — Optional ending image for first-and-last-frame generation. Requires first_frame
first_clip — Optional video clip to continue from (mp4/mov, 2–10s, ≤100MB). Cannot be combined with first_frame
audio — Optional audio file (wav/mp3, 3–30s, ≤15MB) for voice/music synchronization. If not provided, the model auto-generates matching audio
prompt — Text description of the desired motion and action
negative_prompt — Describes content that should not appear in the video
resolution — Output resolution: 720p or 1080p (default: 1080p)
duration — Output duration in seconds (2–15, default: 5)
enable_prompt_expansion — Automatically expand short prompts for better results. Improves quality but increases latency (default: true)
seed — Random seed for reproducible results

Tips for good results

Describe the motion. Focus your prompt on what should move and how — “a woman turns her head and smiles” works better than just describing the scene.
Use high-quality source images. Sharp, well-lit images with clear subjects produce the best animations.
Keep it short. 2–5 second clips tend to have the most coherent motion.
Use first-and-last-frame mode when you need precise control over where the animation ends up.
Enable prompt expansion for short prompts — it helps the model understand your intent better.

Limitations

Complex multi-character interactions may produce inconsistent results
Very long durations (10+ seconds) can show motion degradation toward the end
Fine details like text or small patterns in the source image may not be perfectly preserved
Physics-defying motions described in the prompt may not render correctly

Try it out on the Replicate playground.

Model created 2 days, 16 hours ago

Model updated 2 days, 8 hours ago

Examples