## Basic model info

Model name: xai/grok-imagine-r2v
Model description: Generate videos guided by reference images using xAI's Grok Imagine Video model


## Model inputs

- prompt (required): Text prompt describing the video to generate (string)
- reference_images (required): Reference images to guide the video generation (1-7 images). These are used as style and content references, not as starting frames. (array)
- duration (optional): Duration of the video in seconds (1-10). (integer)
- aspect_ratio (optional): Aspect ratio of the generated video. (string)
- resolution (optional): Resolution of the generated video. (string)


## Model output schema

{
  "type": "string",
  "title": "Output",
  "format": "uri"
}

If the input or output schema includes a format of URI, it is referring to a file.


## Example inputs and outputs

Use these example outputs to better understand the types of inputs the model accepts, and the types of outputs the model returns:

### Example (https://replicate.com/p/4wvkgt90d1rmy0cx3gg884kxx4)

#### Input

```json
{
  "prompt": "A breathtaking cinematic aerial shot sweeping over the pyramids at golden hour, with a monarch butterfly gliding through the warm desert air in the foreground, dust particles catching the light, epic scale",
  "duration": 10,
  "resolution": "720p",
  "aspect_ratio": "16:9",
  "reference_images": [
    "https://multimedia-example-files.replicate.dev/pyramids-giza.4372x2906.jpg",
    "https://multimedia-example-files.replicate.dev/monarch-butterfly.3008x2000.jpg"
  ]
}
```

#### Output

```json
"https://replicate.delivery/xezq/4eaW41RfXToq2EfmtoXuYTg85BX43ye5Sz7Rxm0sJUkZ9QPZB/tmp37j6aeoo.mp4"
```


### Example (https://replicate.com/p/2w76fj9b79rmy0cx3gg989j7k4)

#### Input

```json
{
  "prompt": "The Earth slowly rotates in the vast emptiness of space, clouds swirling over continents, city lights twinkling on the night side, gentle camera drift, IMAX documentary style, awe-inspiring",
  "duration": 10,
  "resolution": "720p",
  "aspect_ratio": "1:1",
  "reference_images": [
    "https://multimedia-example-files.replicate.dev/earth-apollo-17.3000x3002.jpg"
  ]
}
```

#### Output

```json
"https://replicate.delivery/xezq/YNQMArMoThIvDBin1UUtk44LoeSAoCUqM4zr3fEefGxM6QPZB/tmpecm_cs9s.mp4"
```


### Example (https://replicate.com/p/3qbj2ahtrnrmy0cx3gg82enzf4)

#### Input

```json
{
  "prompt": "A dramatic time-lapse of clouds rushing over the snow-capped Himalayan peaks, sunlight breaking through gaps to create god rays across the valleys, sweeping drone shot, epic nature documentary style",
  "duration": 10,
  "resolution": "720p",
  "aspect_ratio": "16:9",
  "reference_images": [
    "https://multimedia-example-files.replicate.dev/mount-everest.2971x1615.jpg"
  ]
}
```

#### Output

```json
"https://replicate.delivery/xezq/ctuql1TpafRyTaLqUKi0lIETqmc6OTO4qkyfDKtBeJGweQPZB/tmpb8mi167d.mp4"
```


### Example (https://replicate.com/p/cpnpjfvb0nrmw0cx3gyap0zwv0)

#### Input

```json
{
  "prompt": "A grand museum gallery comes to life at night: the portrait of Kepler gazes at a rotating globe of Earth, while a butterfly specimen escapes its glass case and flutters past ancient temple artifacts. Warm museum lighting, slow tracking shot down the gallery corridor, Night at the Museum style, magical and cinematic",
  "duration": 10,
  "resolution": "720p",
  "aspect_ratio": "16:9",
  "reference_images": [
    "https://multimedia-example-files.replicate.dev/johannes-kepler-portrait.620x628.webp",
    "https://multimedia-example-files.replicate.dev/earth-apollo-17.3000x3002.jpg",
    "https://multimedia-example-files.replicate.dev/monarch-butterfly.3008x2000.jpg"
  ]
}
```

#### Output

```json
"https://replicate.delivery/xezq/aB81bfzJeEtaXUNMLAZfMfzeoesgx5zmefzomHEDoDfMexSPZB/tmp1p2iafh5.mp4"
```


### Example (https://replicate.com/p/htkw4fxt6drmt0cx3hbs7jp024)

#### Input

```json
{
  "prompt": "Four friends sitting together at a sun-drenched outdoor restaurant table, laughing and waving at the camera. Warm golden hour light, Mediterranean terrace setting with climbing vines and the sea in the background. Slow cinematic camera push-in, joyful and candid atmosphere",
  "duration": 10,
  "resolution": "720p",
  "aspect_ratio": "16:9",
  "reference_images": [
    "https://replicate.delivery/xezq/bC3bk0u7T6bGKVoFaM3s7u6zVf1ecc5ZResjP0JJXMxcEqnsA/tmp5pp1qmec.jpg",
    "https://replicate.delivery/xezq/RroWzQAvTTJzJJKXzeeQEX1EpFEMDwqK417ZeLCChXYfIUPZB/tmpc6m8q83n.jpg",
    "https://replicate.delivery/xezq/QjM2OpqPNRqeYSoZqcKtVV4ejY16SYYV6bM1igobkNrQC1TWA/tmpt5xetoov.jpg",
    "https://replicate.delivery/xezq/h5enEMBIpkT7GiZJixOf6VNGl4sU8d8nIXRKkaw66CpRC1TWA/tmpihuiwgtw.jpg"
  ]
}
```

#### Output

```json
"https://replicate.delivery/xezq/9vPw5BP1VmYPEV1J584bD2gIY4XX339R9uLMuH3X33zGS9kF/tmpftqwcfzi.mp4"
```


## Model readme

> # Grok Imagine R2V
> 
> Generate videos guided by reference images using xAI's Grok Imagine Video model.
> 
> Reference-to-Video (R2V) takes one or more images and uses them as style and content references to guide video generation. Unlike image-to-video (where the image becomes the first frame), R2V treats your images as creative direction — the model draws on their visual style, subjects, and composition to produce something new.
> 
> ## What it does
> 
> Provide up to 7 reference images along with a text prompt, and the model generates a video that reflects the visual characteristics of your references. This is useful for:
> 
> - **Character consistency**: Use photos of a character from different angles, then generate video of them in a new scene
> - **Style transfer**: Feed in images with a specific aesthetic (watercolor, noir, retro film) and the model carries that style into the video
> - **Multi-subject scenes**: Combine references of different subjects — a butterfly and a landscape, two characters, a product and a setting — and bring them together in motion
> - **Creative remixing**: Give the model a painting, a photo, and a sketch, and let it synthesize something that blends all three
> 
> ## How to use it
> 
> The key difference from image-to-video is the `reference_images` input. Pass a list of image URLs or uploaded files:
> 
> ```python
> import replicate
> 
> output = replicate.run(
>     "xai/grok-imagine-r2v",
>     input={
>         "prompt": "A monarch butterfly gliding over ancient pyramids at golden hour, cinematic aerial shot",
>         "reference_images": [
>             "https://example.com/butterfly.jpg",
>             "https://example.com/pyramids.jpg"
>         ],
>         "duration": 8,
>         "aspect_ratio": "16:9",
>         "resolution": "720p"
>     }
> )
> print(output)
> ```
> 
> ## Prompt tips for R2V
> 
> Since the model already has visual references, your prompt should focus on **what happens** rather than what things look like:
> 
> - **Describe the action and motion**: "The cat stretches lazily and pounces toward the camera" rather than "a fluffy orange cat"
> - **Specify camera movement**: "slow push-in," "sweeping drone shot," "handheld tracking"
> - **Set the mood**: "warm afternoon light," "dramatic storm clouds," "ethereal glow"
> - **Be specific about how references combine**: "The butterfly flies through the foreground while the pyramids fill the background"
> 
> ## Technical details
> 
> - **Reference images**: 1–7 images (jpg, jpeg, png, webp)
> - **Video duration**: 1–10 seconds
> - **Resolution**: 480p or 720p
> - **Aspect ratios**: 16:9, 9:16, 1:1, 4:3, 3:4, 3:2, 2:3
> - **Prompt length**: Up to 4,096 characters
> 
> ## Limitations
> 
> - R2V cannot be combined with image-to-video (`image` input) or video editing (`video` input) — it's a separate generation mode
> - Very large reference images may hit payload limits. Resize images to reasonable dimensions (under ~4000px on the longest side) before uploading
> - Maximum duration for R2V is 10 seconds, shorter than the 15-second limit for text-to-video