Readme
Wan 2.7 Image-to-Video
Wan 2.7 I2V is an image-to-video generation model from Alibaba’s Wan family. Give it a still image and a text prompt, and it generates a video that brings the image to life — with camera motion, character animation, environmental effects, and more.
How it works
The model takes a starting image and uses it as the first frame of a generated video. A text prompt guides the motion and action. You can also provide a last frame to control where the video ends up, continue from an existing video clip, or supply an audio track for synchronized generation.
This makes it useful for animating illustrations, product shots, concept art, or any still image where you want to add realistic motion without creating footage from scratch.
Generation modes
- Image to video — provide a first frame image and a prompt to animate it
- First-and-last-frame — provide both a first and last frame to control the start and end of the video
- Clip continuation — provide an existing video clip to extend it with new generated frames
- Audio-synchronized — provide an audio file to generate video that matches the audio
Inputs
- first_frame — Starting image to animate into video (jpg/png/bmp/webp, ≤20MB)
- last_frame — Optional ending image for first-and-last-frame generation. Requires first_frame
- first_clip — Optional video clip to continue from (mp4/mov, 2–10s, ≤100MB). Cannot be combined with first_frame
- audio — Optional audio file (wav/mp3, 3–30s, ≤15MB) for voice/music synchronization. If not provided, the model auto-generates matching audio
- prompt — Text description of the desired motion and action
- negative_prompt — Describes content that should not appear in the video
- resolution — Output resolution: 720p or 1080p (default: 1080p)
- duration — Output duration in seconds (2–15, default: 5)
- enable_prompt_expansion — Automatically expand short prompts for better results. Improves quality but increases latency (default: true)
- seed — Random seed for reproducible results
Tips for good results
- Describe the motion. Focus your prompt on what should move and how — “a woman turns her head and smiles” works better than just describing the scene.
- Use high-quality source images. Sharp, well-lit images with clear subjects produce the best animations.
- Keep it short. 2–5 second clips tend to have the most coherent motion.
- Use first-and-last-frame mode when you need precise control over where the animation ends up.
- Enable prompt expansion for short prompts — it helps the model understand your intent better.
Limitations
- Complex multi-character interactions may produce inconsistent results
- Very long durations (10+ seconds) can show motion degradation toward the end
- Fine details like text or small patterns in the source image may not be perfectly preserved
- Physics-defying motions described in the prompt may not render correctly
Try it out on the Replicate playground.