Readme

Wan 2.7 Reference-to-Video

Wan 2.7 R2V is a reference-to-video generation model from Alibaba’s Wan family. Give it one or more reference images or clips plus a text prompt, and it generates a new video that keeps the character, object, or visual identity of your references while following the motion and scene direction in the prompt.

How it works

Unlike text-to-video generation, reference-to-video starts from example visuals. The model uses your reference images or videos as identity anchors, then creates a new clip that matches your prompt while preserving recognizable appearance, styling, and subject details.

This makes it useful for character consistency, product shots, brand assets, mascot animation, and any workflow where you want the output to stay visually tied to a specific subject.

Inputs

prompt — Text description of the action, camera movement, and scene you want to generate
reference_images — Optional reference images of the subject or object to preserve (jpg/png/bmp/webp)
reference_videos — Optional reference clips of the subject or object to preserve (mp4/mov)
negative_prompt — Describes content that should not appear in the video
resolution — Output resolution: 720p or 1080p (default: 1080p)
aspect_ratio — Output aspect ratio: 16:9, 9:16, 1:1, 4:3, or 3:4 (default: 16:9)
duration — Output duration in seconds (2-10, default: 5)
shot_type — Shot structure: single for one continuous shot or multi for multi-shot generation
seed — Random seed for reproducible results

Tips for good results

Use clear references. Sharp images or uncluttered clips with a well-defined subject give the model a stronger identity anchor.
Describe motion, not just appearance. Your references define who or what to preserve; your prompt should focus on what happens in the video.
Keep clips short. 2-5 second outputs tend to stay most coherent.
Use multiple references carefully. Add more than one image or clip only when they all show the same subject consistently.
Use negative prompts to suppress unwanted artifacts or style drift.

Limitations

Identity can drift in complex scenes with multiple moving subjects
Fine details like text, logos, or tiny accessories may not stay perfectly consistent
Very long or highly choreographed actions may reduce resemblance to the references
Mixed or conflicting reference inputs can confuse the model

Try it out on the Replicate playground.

Model created 4 days, 20 hours ago

Model updated 22 hours ago