## Basic model info

Model name: wan-video/wan-2.7-r2v
Model description: Generate videos from reference images or clips while preserving subject identity using Alibaba's Wan 2.7 reference-to-video model


## Model inputs

- prompt (required): Text prompt for video generation (string)
- reference_images (optional): Reference images of the character/object to feature in the video (jpg/png/bmp/webp) (array)
- reference_videos (optional): Reference videos of the character/object to feature in the video (mp4/mov) (array)
- negative_prompt (optional): Negative prompt — describes content that should not appear in the video (string)
- resolution (optional): Output video resolution (string)
- aspect_ratio (optional): Aspect ratio of the generated video (string)
- duration (optional): Duration of the generated video in seconds (integer)
- shot_type (optional): Shot structure of the generated video (string)
- seed (optional): Random seed for reproducible generation. Range: 0-2147483647 (integer)


## Model output schema

{
  "type": "string",
  "title": "Output",
  "format": "uri"
}

If the input or output schema includes a format of URI, it is referring to a file.


## Example inputs and outputs

Use these example outputs to better understand the types of inputs the model accepts, and the types of outputs the model returns:

### Example (https://replicate.com/p/1znq2hhpysrmr0cxae6trnrfem)

#### Input

```json
{
  "seed": 456,
  "prompt": "A curious cat sits on a windowsill watching rain fall outside, cozy indoor lighting, the cat's tail gently swaying, raindrops streaking down the glass, peaceful and contemplative mood",
  "duration": 5,
  "resolution": "1080p",
  "aspect_ratio": "9:16",
  "reference_images": [
    "https://multimedia-example-files.replicate.dev/cat-domestic.5935x3898.jpg"
  ]
}
```

#### Output

```json
"https://replicate.delivery/xezq/z1w2D2Em9AbnDBjTTY4tlGHbdzwPnpZq2cKUdQuDJYmx21lF/tmpmo7drbqt.mp4"
```


### Example (https://replicate.com/p/1znq2hhpysrmr0cxae6trnrfem)

#### Input

```json
{
  "seed": 456,
  "prompt": "A curious cat sits on a windowsill watching rain fall outside, cozy indoor lighting, the cat's tail gently swaying, raindrops streaking down the glass, peaceful and contemplative mood",
  "duration": 5,
  "resolution": "1080p",
  "aspect_ratio": "9:16",
  "reference_images": [
    "https://multimedia-example-files.replicate.dev/cat-domestic.5935x3898.jpg"
  ]
}
```

#### Output

```json
"https://replicate.delivery/xezq/z1w2D2Em9AbnDBjTTY4tlGHbdzwPnpZq2cKUdQuDJYmx21lF/tmpmo7drbqt.mp4"
```


### Example (https://replicate.com/p/vb3whydcwsrmy0cxaf09kk7cr8)

#### Input

```json
{
  "seed": 7777,
  "prompt": "The sneakers rotate slowly on a reflective black surface, dramatic studio lighting with blue and orange color gels casting highlights, dust particles floating in beams of light, premium product commercial aesthetic.",
  "duration": 5,
  "shot_type": "single",
  "resolution": "1080p",
  "aspect_ratio": "16:9",
  "reference_images": [
    "https://images.pexels.com/photos/2529148/pexels-photo-2529148.jpeg?w=1920"
  ]
}
```

#### Output

```json
"https://replicate.delivery/xezq/Weqer2ARU8h5cUvWR9EK0pdfqNIGAm32Xn4vNGGh6Q7mMwusA/tmpifncfet4.mp4"
```


## Model readme

> # Wan 2.7 Reference-to-Video
> 
> Wan 2.7 R2V is a reference-to-video generation model from Alibaba's Wan family. Give it one or more reference images or clips plus a text prompt, and it generates a new video that keeps the character, object, or visual identity of your references while following the motion and scene direction in the prompt.
> 
> ## How it works
> 
> Unlike text-to-video generation, reference-to-video starts from example visuals. The model uses your reference images or videos as identity anchors, then creates a new clip that matches your prompt while preserving recognizable appearance, styling, and subject details.
> 
> This makes it useful for character consistency, product shots, brand assets, mascot animation, and any workflow where you want the output to stay visually tied to a specific subject.
> 
> ## Inputs
> 
> - **prompt** — Text description of the action, camera movement, and scene you want to generate
> - **reference_images** — Optional reference images of the subject or object to preserve (jpg/png/bmp/webp)
> - **reference_videos** — Optional reference clips of the subject or object to preserve (mp4/mov)
> - **negative_prompt** — Describes content that should not appear in the video
> - **resolution** — Output resolution: 720p or 1080p (default: 1080p)
> - **aspect_ratio** — Output aspect ratio: 16:9, 9:16, 1:1, 4:3, or 3:4 (default: 16:9)
> - **duration** — Output duration in seconds (2-10, default: 5)
> - **shot_type** — Shot structure: `single` for one continuous shot or `multi` for multi-shot generation
> - **seed** — Random seed for reproducible results
> 
> ## Tips for good results
> 
> - **Use clear references.** Sharp images or uncluttered clips with a well-defined subject give the model a stronger identity anchor.
> - **Describe motion, not just appearance.** Your references define who or what to preserve; your prompt should focus on what happens in the video.
> - **Keep clips short.** 2-5 second outputs tend to stay most coherent.
> - **Use multiple references carefully.** Add more than one image or clip only when they all show the same subject consistently.
> - **Use negative prompts** to suppress unwanted artifacts or style drift.
> 
> ## Limitations
> 
> - Identity can drift in complex scenes with multiple moving subjects
> - Fine details like text, logos, or tiny accessories may not stay perfectly consistent
> - Very long or highly choreographed actions may reduce resemblance to the references
> - Mixed or conflicting reference inputs can confuse the model
> 
> Try it out on the [Replicate playground](https://replicate.com/playground).

