Readme
Wan 2.7 Reference-to-Video
Wan 2.7 R2V is a reference-to-video generation model from Alibaba’s Wan family. Give it one or more reference images or clips plus a text prompt, and it generates a new video that keeps the character, object, or visual identity of your references while following the motion and scene direction in the prompt.
How it works
Unlike text-to-video generation, reference-to-video starts from example visuals. The model uses your reference images or videos as identity anchors, then creates a new clip that matches your prompt while preserving recognizable appearance, styling, and subject details.
This makes it useful for character consistency, product shots, brand assets, mascot animation, and any workflow where you want the output to stay visually tied to a specific subject.
Inputs
- prompt — Text description of the action, camera movement, and scene you want to generate
- reference_images — Optional reference images of the subject or object to preserve (jpg/png/bmp/webp)
- reference_videos — Optional reference clips of the subject or object to preserve (mp4/mov)
- negative_prompt — Describes content that should not appear in the video
- resolution — Output resolution: 720p or 1080p (default: 1080p)
- aspect_ratio — Output aspect ratio: 16:9, 9:16, 1:1, 4:3, or 3:4 (default: 16:9)
- duration — Output duration in seconds (2-10, default: 5)
- shot_type — Shot structure:
singlefor one continuous shot ormultifor multi-shot generation - seed — Random seed for reproducible results
Tips for good results
- Use clear references. Sharp images or uncluttered clips with a well-defined subject give the model a stronger identity anchor.
- Describe motion, not just appearance. Your references define who or what to preserve; your prompt should focus on what happens in the video.
- Keep clips short. 2-5 second outputs tend to stay most coherent.
- Use multiple references carefully. Add more than one image or clip only when they all show the same subject consistently.
- Use negative prompts to suppress unwanted artifacts or style drift.
Limitations
- Identity can drift in complex scenes with multiple moving subjects
- Fine details like text, logos, or tiny accessories may not stay perfectly consistent
- Very long or highly choreographed actions may reduce resemblance to the references
- Mixed or conflicting reference inputs can confuse the model
Try it out on the Replicate playground.