tiger-ai-lab / anyv2v

Tuning-free framework to achieve high appearance and temporal consistency in video editing

Cold

Public
992 runs
L40S
GitHub
Paper
License

Iterate in playground

Run with an API

Playground API Examples README Versions

Input

video

*file

Preview

Input video

edited_first_frame

file

Provide the edited first frame of the input video. This is optional, leave it blank and provide the prompt below to use the default pipeline that edits the frist frame with instructpix2pix

instruct_pix2pix_prompt

string

Shift + Return to add a new line

The first step invovles using timbrooks/instruct-pix2pix to edit the first frame. Specify the prompt for editing the first frame. This will be ignored if edited_first_frame above is provided.

Default: "turn man into robot"

editing_prompt

string

Shift + Return to add a new line

a man doing exercises for the body and minda man doing exercises for the body and mind

Describe the input video

Default: "a man doing exercises for the body and mind"

editing_negative_prompt

string

Shift + Return to add a new line

Distorted, discontinuous, Ugly, blurry, low resolution, motionless, static, disfigured, disconnected limbs, Ugly faces, incomplete armsDistorted, discontinuous, Ugly, blurry, low resolution, motionless, static, disfigured, disconnected limbs, Ugly faces, incomplete arms

Things not to see int the edited video

Default: "Distorted, discontinuous, Ugly, blurry, low resolution, motionless, static, disfigured, disconnected limbs, Ugly faces, incomplete arms"

num_inference_steps

integer

(minimum: 1, maximum: 500)

Number of denoising steps

Default: 50

guidance_scale

number

(minimum: 1, maximum: 20)

Scale for classifier-free guidance

Default: 9

pnp_f_t

number

(minimum: 0, maximum: 1)

Specifies the proportion of time steps in the DDIM sampling process where the convolutional injection is applied. A higher value improves motion consistency. 1.0 indicates injection at every time step

Default: 1

pnp_spatial_attn_t

number

(minimum: 0, maximum: 1)

Specifies the proportion of time steps in the DDIM sampling process where the spatial attention injection is applied. A higher value improves motion consistency. 1.0 indicates injection at every time step

Default: 1

pnp_temp_attn_t

number

(minimum: 0, maximum: 1)

Specifies the proportion of time steps in the DDIM sampling process where the temporal attention injection is applied. A higher value improves motion consistency. 1.0 indicates injection at every time step

Default: 1

ddim_init_latents_t_idx

integer

(minimum: 0)

This parameter determines the time step index at which to begin sampling from the initial DDIM inversed latents, with a range of [0, num_inference_steps-1]. In the context of a DDIM sampling process where the sampling step is 50, the scheduler progresses through the time steps in the sequence [981, 961, 941, ..., 1]. Therefore, setting ddim_init_latents_t_idx to 0 initiates the sampling from t=981, whereas setting it to 1 starts the process at t=961. A higher index enhances motion consistency with the source video but may lead to flickering and cause the edited video to diverge from the edited first frame.

Default: 0

ddim_inversion_steps

integer

Number of ddim inversion steps

Default: 100

seed

integer

Random seed. Leave blank to randomize the seed

Run this model in Node.js with one line of code:

npx create-replicate --model=tiger-ai-lab/anyv2v

or set up a project from scratch

Install Replicate’s Node.js client library:

npm install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import and set up the client:

import Replicate from "replicate";
import fs from "node:fs";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run tiger-ai-lab/anyv2v using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "tiger-ai-lab/anyv2v:3c7b5bc5bcae13a0c945e6f918bb8409f76cc46102c7c9a208bd1531f82c5684",
  {
    input: {
      video: "https://replicate.delivery/pbxt/KcsKIflCcgFseI734HsfUIPHr4gBir2RTKoaFs73qGIB8qeo/test.mp4",
      pnp_f_t: 1,
      editing_prompt: "a man doing exercises for the body and mind",
      guidance_scale: 9,
      pnp_temp_attn_t: 1,
      pnp_spatial_attn_t: 1,
      num_inference_steps: 50,
      ddim_inversion_steps: 100,
      ddim_init_latents_t_idx: 0,
      editing_negative_prompt: "Distorted, discontinuous, Ugly, blurry, low resolution, motionless, static, disfigured, disconnected limbs, Ugly faces, incomplete arms",
      instruct_pix2pix_prompt: "turn man into robot"
    }
  }
);

// To access the file URL:
console.log(output.url()); //=> "http://example.com"

// To write the file to disk:
fs.writeFile("my-image.png", output);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import the client:

import replicate

Run tiger-ai-lab/anyv2v using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "tiger-ai-lab/anyv2v:3c7b5bc5bcae13a0c945e6f918bb8409f76cc46102c7c9a208bd1531f82c5684",
    input={
        "video": "https://replicate.delivery/pbxt/KcsKIflCcgFseI734HsfUIPHr4gBir2RTKoaFs73qGIB8qeo/test.mp4",
        "pnp_f_t": 1,
        "editing_prompt": "a man doing exercises for the body and mind",
        "guidance_scale": 9,
        "pnp_temp_attn_t": 1,
        "pnp_spatial_attn_t": 1,
        "num_inference_steps": 50,
        "ddim_inversion_steps": 100,
        "ddim_init_latents_t_idx": 0,
        "editing_negative_prompt": "Distorted, discontinuous, Ugly, blurry, low resolution, motionless, static, disfigured, disconnected limbs, Ugly faces, incomplete arms",
        "instruct_pix2pix_prompt": "turn man into robot"
    }
)
print(output)

To learn more, take a look at the guide on getting started with Python.

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Run tiger-ai-lab/anyv2v using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "tiger-ai-lab/anyv2v:3c7b5bc5bcae13a0c945e6f918bb8409f76cc46102c7c9a208bd1531f82c5684",
    "input": {
      "video": "https://replicate.delivery/pbxt/KcsKIflCcgFseI734HsfUIPHr4gBir2RTKoaFs73qGIB8qeo/test.mp4",
      "pnp_f_t": 1,
      "editing_prompt": "a man doing exercises for the body and mind",
      "guidance_scale": 9,
      "pnp_temp_attn_t": 1,
      "pnp_spatial_attn_t": 1,
      "num_inference_steps": 50,
      "ddim_inversion_steps": 100,
      "ddim_init_latents_t_idx": 0,
      "editing_negative_prompt": "Distorted, discontinuous, Ugly, blurry, low resolution, motionless, static, disfigured, disconnected limbs, Ugly faces, incomplete arms",
      "instruct_pix2pix_prompt": "turn man into robot"
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

Output

{
  "completed_at": "2024-03-24T21:40:47.842657Z",
  "created_at": "2024-03-24T21:39:02.028523Z",
  "data_removed": false,
  "error": null,
  "id": "h4h3l7lba6y3zng2gg47ncbkve",
  "input": {
    "video": "https://replicate.delivery/pbxt/KcsKIflCcgFseI734HsfUIPHr4gBir2RTKoaFs73qGIB8qeo/test.mp4",
    "pnp_f_t": 1,
    "editing_prompt": "a man doing exercises for the body and mind",
    "guidance_scale": 9,
    "pnp_temp_attn_t": 1,
    "pnp_spatial_attn_t": 1,
    "num_inference_steps": 50,
    "ddim_inversion_steps": 100,
    "ddim_init_latents_t_idx": 0,
    "editing_negative_prompt": "Distorted, discontinuous, Ugly, blurry, low resolution, motionless, static, disfigured, disconnected limbs, Ugly faces, incomplete arms",
    "instruct_pix2pix_prompt": "turn man into robot"
  },
  "logs": "Using seed: 32867\n  0%|          | 0/100 [00:00<?, ?it/s]\n  1%|          | 1/100 [00:00<00:13,  7.08it/s]\n  4%|▍         | 4/100 [00:00<00:05, 17.69it/s]\n  7%|▋         | 7/100 [00:00<00:04, 21.80it/s]\n 10%|█         | 10/100 [00:00<00:03, 23.84it/s]\n 13%|█▎        | 13/100 [00:00<00:03, 25.05it/s]\n 16%|█▌        | 16/100 [00:00<00:03, 25.82it/s]\n 19%|█▉        | 19/100 [00:00<00:03, 26.32it/s]\n 22%|██▏       | 22/100 [00:00<00:02, 26.64it/s]\n 25%|██▌       | 25/100 [00:01<00:02, 26.83it/s]\n 28%|██▊       | 28/100 [00:01<00:02, 26.98it/s]\n 31%|███       | 31/100 [00:01<00:02, 27.10it/s]\n 34%|███▍      | 34/100 [00:01<00:02, 27.16it/s]\n 37%|███▋      | 37/100 [00:01<00:02, 27.20it/s]\n 40%|████      | 40/100 [00:01<00:02, 27.23it/s]\n 43%|████▎     | 43/100 [00:01<00:02, 27.26it/s]\n 46%|████▌     | 46/100 [00:01<00:01, 27.30it/s]\n 49%|████▉     | 49/100 [00:01<00:01, 27.32it/s]\n 52%|█████▏    | 52/100 [00:02<00:01, 27.34it/s]\n 55%|█████▌    | 55/100 [00:02<00:01, 27.35it/s]\n 58%|█████▊    | 58/100 [00:02<00:01, 27.35it/s]\n 61%|██████    | 61/100 [00:02<00:01, 27.32it/s]\n 64%|██████▍   | 64/100 [00:02<00:01, 27.32it/s]\n 67%|██████▋   | 67/100 [00:02<00:01, 27.31it/s]\n 70%|███████   | 70/100 [00:02<00:01, 27.30it/s]\n 73%|███████▎  | 73/100 [00:02<00:00, 27.32it/s]\n 76%|███████▌  | 76/100 [00:02<00:00, 27.33it/s]\n 79%|███████▉  | 79/100 [00:02<00:00, 27.35it/s]\n 82%|████████▏ | 82/100 [00:03<00:00, 27.35it/s]\n 85%|████████▌ | 85/100 [00:03<00:00, 27.36it/s]\n 88%|████████▊ | 88/100 [00:03<00:00, 27.35it/s]\n 91%|█████████ | 91/100 [00:03<00:00, 27.34it/s]\n 94%|█████████▍| 94/100 [00:03<00:00, 27.34it/s]\n 97%|█████████▋| 97/100 [00:03<00:00, 27.34it/s]\n100%|██████████| 100/100 [00:03<00:00, 27.34it/s]\n100%|██████████| 100/100 [00:03<00:00, 26.56it/s]\nProcessed and saved the first frame: exp_dir/edited_first_frame.png\nLoading pipeline components...:   0%|          | 0/7 [00:00<?, ?it/s]\nLoading pipeline components...:  29%|██▊       | 2/7 [00:00<00:00,  8.83it/s]The config attributes {'attention_head_dim': 64} were passed to I2VGenXLUNet, but are not expected and will be ignored. Please verify your config.json configuration file.\nLoading pipeline components...:  57%|█████▋    | 4/7 [00:00<00:00,  6.27it/s]\nLoading pipeline components...: 100%|██████████| 7/7 [00:00<00:00,  8.48it/s]\nLoading pipeline components...: 100%|██████████| 7/7 [00:00<00:00,  8.09it/s]\n  0%|          | 0/100 [00:00<?, ?it/s]\n  1%|          | 1/100 [00:00<00:26,  3.75it/s]\n  2%|▏         | 2/100 [00:00<00:24,  3.93it/s]\n  3%|▎         | 3/100 [00:00<00:24,  4.00it/s]\n  4%|▍         | 4/100 [00:01<00:23,  4.04it/s]\n  5%|▌         | 5/100 [00:01<00:23,  4.06it/s]\n  6%|▌         | 6/100 [00:01<00:23,  4.06it/s]\n  7%|▋         | 7/100 [00:01<00:22,  4.07it/s]\n  8%|▊         | 8/100 [00:01<00:22,  4.07it/s]\n  9%|▉         | 9/100 [00:02<00:22,  4.07it/s]\n 10%|█         | 10/100 [00:02<00:22,  4.07it/s]\n 11%|█         | 11/100 [00:02<00:21,  4.07it/s]\n 12%|█▏        | 12/100 [00:02<00:21,  4.07it/s]\n 13%|█▎        | 13/100 [00:03<00:21,  4.07it/s]\n 14%|█▍        | 14/100 [00:03<00:21,  4.08it/s]\n 15%|█▌        | 15/100 [00:03<00:20,  4.08it/s]\n 16%|█▌        | 16/100 [00:03<00:20,  4.07it/s]\n 17%|█▋        | 17/100 [00:04<00:20,  4.07it/s]\n 18%|█▊        | 18/100 [00:04<00:20,  4.07it/s]\n 19%|█▉        | 19/100 [00:04<00:19,  4.07it/s]\n 20%|██        | 20/100 [00:04<00:19,  4.07it/s]\n 21%|██        | 21/100 [00:05<00:19,  4.07it/s]\n 22%|██▏       | 22/100 [00:05<00:19,  4.07it/s]\n 23%|██▎       | 23/100 [00:05<00:18,  4.07it/s]\n 24%|██▍       | 24/100 [00:05<00:18,  4.07it/s]\n 25%|██▌       | 25/100 [00:06<00:18,  4.07it/s]\n 26%|██▌       | 26/100 [00:06<00:18,  4.07it/s]\n 27%|██▋       | 27/100 [00:06<00:17,  4.07it/s]\n 28%|██▊       | 28/100 [00:06<00:17,  4.07it/s]\n 29%|██▉       | 29/100 [00:07<00:17,  4.07it/s]\n 30%|███       | 30/100 [00:07<00:17,  4.07it/s]\n 31%|███       | 31/100 [00:07<00:16,  4.07it/s]\n 32%|███▏      | 32/100 [00:07<00:16,  4.07it/s]\n 33%|███▎      | 33/100 [00:08<00:16,  4.08it/s]\n 34%|███▍      | 34/100 [00:08<00:16,  4.08it/s]\n 35%|███▌      | 35/100 [00:08<00:15,  4.07it/s]\n 36%|███▌      | 36/100 [00:08<00:15,  4.07it/s]\n 37%|███▋      | 37/100 [00:09<00:15,  4.06it/s]\n 38%|███▊      | 38/100 [00:09<00:15,  4.07it/s]\n 39%|███▉      | 39/100 [00:09<00:15,  4.06it/s]\n 40%|████      | 40/100 [00:09<00:14,  4.07it/s]\n 41%|████      | 41/100 [00:10<00:14,  4.07it/s]\n 42%|████▏     | 42/100 [00:10<00:14,  4.08it/s]\n 43%|████▎     | 43/100 [00:10<00:14,  4.07it/s]\n 44%|████▍     | 44/100 [00:10<00:13,  4.07it/s]\n 45%|████▌     | 45/100 [00:11<00:13,  4.07it/s]\n 46%|████▌     | 46/100 [00:11<00:13,  4.07it/s]\n 47%|████▋     | 47/100 [00:11<00:13,  4.07it/s]\n 48%|████▊     | 48/100 [00:11<00:12,  4.07it/s]\n 49%|████▉     | 49/100 [00:12<00:12,  4.07it/s]\n 50%|█████     | 50/100 [00:12<00:12,  4.08it/s]\n 51%|█████     | 51/100 [00:12<00:12,  4.07it/s]\n 52%|█████▏    | 52/100 [00:12<00:11,  4.07it/s]\n 53%|█████▎    | 53/100 [00:13<00:11,  4.07it/s]\n 54%|█████▍    | 54/100 [00:13<00:11,  4.07it/s]\n 55%|█████▌    | 55/100 [00:13<00:11,  4.07it/s]\n 56%|█████▌    | 56/100 [00:13<00:10,  4.08it/s]\n 57%|█████▋    | 57/100 [00:14<00:10,  4.07it/s]\n 58%|█████▊    | 58/100 [00:14<00:10,  4.07it/s]\n 59%|█████▉    | 59/100 [00:14<00:10,  4.07it/s]\n 60%|██████    | 60/100 [00:14<00:09,  4.06it/s]\n 61%|██████    | 61/100 [00:15<00:09,  4.06it/s]\n 62%|██████▏   | 62/100 [00:15<00:09,  4.06it/s]\n 63%|██████▎   | 63/100 [00:15<00:09,  4.06it/s]\n 64%|██████▍   | 64/100 [00:15<00:08,  4.06it/s]\n 65%|██████▌   | 65/100 [00:15<00:08,  4.06it/s]\n 66%|██████▌   | 66/100 [00:16<00:08,  4.06it/s]\n 67%|██████▋   | 67/100 [00:16<00:08,  4.06it/s]\n 68%|██████▊   | 68/100 [00:16<00:07,  4.06it/s]\n 69%|██████▉   | 69/100 [00:16<00:07,  4.06it/s]\n 70%|███████   | 70/100 [00:17<00:07,  4.07it/s]\n 71%|███████   | 71/100 [00:17<00:07,  4.07it/s]\n 72%|███████▏  | 72/100 [00:17<00:06,  4.06it/s]\n 73%|███████▎  | 73/100 [00:17<00:06,  4.05it/s]\n 74%|███████▍  | 74/100 [00:18<00:06,  4.06it/s]\n 75%|███████▌  | 75/100 [00:18<00:06,  4.07it/s]\n 76%|███████▌  | 76/100 [00:18<00:05,  4.07it/s]\n 77%|███████▋  | 77/100 [00:18<00:05,  4.06it/s]\n 78%|███████▊  | 78/100 [00:19<00:05,  4.07it/s]\n 79%|███████▉  | 79/100 [00:19<00:05,  4.07it/s]\n 80%|████████  | 80/100 [00:19<00:04,  4.05it/s]\n 81%|████████  | 81/100 [00:19<00:04,  4.01it/s]\n 82%|████████▏ | 82/100 [00:20<00:04,  4.03it/s]\n 83%|████████▎ | 83/100 [00:20<00:04,  4.04it/s]\n 84%|████████▍ | 84/100 [00:20<00:03,  4.05it/s]\n 85%|████████▌ | 85/100 [00:20<00:03,  4.05it/s]\n 86%|████████▌ | 86/100 [00:21<00:03,  4.05it/s]\n 87%|████████▋ | 87/100 [00:21<00:03,  4.05it/s]\n 88%|████████▊ | 88/100 [00:21<00:02,  4.06it/s]\n 89%|████████▉ | 89/100 [00:21<00:02,  4.06it/s]\n 90%|█████████ | 90/100 [00:22<00:02,  4.07it/s]\n 91%|█████████ | 91/100 [00:22<00:02,  4.07it/s]\n 92%|█████████▏| 92/100 [00:22<00:01,  4.07it/s]\n 93%|█████████▎| 93/100 [00:22<00:01,  4.06it/s]\n 94%|█████████▍| 94/100 [00:23<00:01,  4.06it/s]\n 95%|█████████▌| 95/100 [00:23<00:01,  4.06it/s]\n 96%|█████████▌| 96/100 [00:23<00:00,  4.06it/s]\n 97%|█████████▋| 97/100 [00:23<00:00,  4.06it/s]\n 98%|█████████▊| 98/100 [00:24<00:00,  4.06it/s]\n 99%|█████████▉| 99/100 [00:24<00:00,  4.06it/s]\n100%|██████████| 100/100 [00:24<00:00,  4.06it/s]\n100%|██████████| 100/100 [00:24<00:00,  4.06it/s]\nddim_scheduler.timesteps: tensor([981, 961, 941, 921, 901, 881, 861, 841, 821, 801, 781, 761, 741, 721,\n701, 681, 661, 641, 621, 601, 581, 561, 541, 521, 501, 481, 461, 441,\n421, 401, 381, 361, 341, 321, 301, 281, 261, 241, 221, 201, 181, 161,\n141, 121, 101,  81,  61,  41,  21,   1])\nddim_scheduler.timesteps[t_idx]: 981\nddim_latents_at_t.shape: torch.Size([1, 4, 16, 64, 64])\nBlending random_ratio (1 means random latent): 0.0\n  0%|          | 0/50 [00:00<?, ?it/s]\n  2%|▏         | 1/50 [00:00<00:31,  1.54it/s]\n  4%|▍         | 2/50 [00:01<00:30,  1.56it/s]\n  6%|▌         | 3/50 [00:01<00:29,  1.57it/s]\n  8%|▊         | 4/50 [00:02<00:29,  1.58it/s]\n 10%|█         | 5/50 [00:03<00:28,  1.58it/s]\n 12%|█▏        | 6/50 [00:03<00:27,  1.58it/s]\n 14%|█▍        | 7/50 [00:04<00:27,  1.58it/s]\n 16%|█▌        | 8/50 [00:05<00:26,  1.58it/s]\n 18%|█▊        | 9/50 [00:05<00:25,  1.58it/s]\n 20%|██        | 10/50 [00:06<00:25,  1.58it/s]\n 22%|██▏       | 11/50 [00:06<00:24,  1.58it/s]\n 24%|██▍       | 12/50 [00:07<00:24,  1.58it/s]\n 26%|██▌       | 13/50 [00:08<00:23,  1.58it/s]\n 28%|██▊       | 14/50 [00:08<00:22,  1.58it/s]\n 30%|███       | 15/50 [00:09<00:22,  1.58it/s]\n 32%|███▏      | 16/50 [00:10<00:21,  1.58it/s]\n 34%|███▍      | 17/50 [00:10<00:20,  1.58it/s]\n 36%|███▌      | 18/50 [00:11<00:20,  1.58it/s]\n 38%|███▊      | 19/50 [00:12<00:19,  1.58it/s]\n 40%|████      | 20/50 [00:12<00:18,  1.58it/s]\n 42%|████▏     | 21/50 [00:13<00:18,  1.58it/s]\n 44%|████▍     | 22/50 [00:13<00:17,  1.58it/s]\n 46%|████▌     | 23/50 [00:14<00:17,  1.58it/s]\n 48%|████▊     | 24/50 [00:15<00:16,  1.58it/s]\n 50%|█████     | 25/50 [00:15<00:15,  1.58it/s]\n 52%|█████▏    | 26/50 [00:16<00:15,  1.58it/s]\n 54%|█████▍    | 27/50 [00:17<00:14,  1.58it/s]\n 56%|█████▌    | 28/50 [00:17<00:13,  1.58it/s]\n 58%|█████▊    | 29/50 [00:18<00:13,  1.58it/s]\n 60%|██████    | 30/50 [00:19<00:12,  1.58it/s]\n 62%|██████▏   | 31/50 [00:19<00:12,  1.58it/s]\n 64%|██████▍   | 32/50 [00:20<00:11,  1.58it/s]\n 66%|██████▌   | 33/50 [00:20<00:10,  1.58it/s]\n 68%|██████▊   | 34/50 [00:21<00:10,  1.58it/s]\n 70%|███████   | 35/50 [00:22<00:09,  1.58it/s]\n 72%|███████▏  | 36/50 [00:22<00:08,  1.58it/s]\n 74%|███████▍  | 37/50 [00:23<00:08,  1.58it/s]\n 76%|███████▌  | 38/50 [00:24<00:07,  1.58it/s]\n 78%|███████▊  | 39/50 [00:24<00:06,  1.58it/s]\n 80%|████████  | 40/50 [00:25<00:06,  1.58it/s]\n 82%|████████▏ | 41/50 [00:25<00:05,  1.57it/s]\n 84%|████████▍ | 42/50 [00:26<00:05,  1.58it/s]\n 86%|████████▌ | 43/50 [00:27<00:04,  1.58it/s]\n 88%|████████▊ | 44/50 [00:27<00:03,  1.58it/s]\n 90%|█████████ | 45/50 [00:28<00:03,  1.58it/s]\n 92%|█████████▏| 46/50 [00:29<00:02,  1.58it/s]\n 94%|█████████▍| 47/50 [00:29<00:01,  1.58it/s]\n 96%|█████████▌| 48/50 [00:30<00:01,  1.58it/s]\n 98%|█████████▊| 49/50 [00:31<00:00,  1.58it/s]\n100%|██████████| 50/50 [00:31<00:00,  1.58it/s]\n100%|██████████| 50/50 [00:31<00:00,  1.58it/s]",
  "metrics": {
    "predict_time": 65.411925,
    "total_time": 105.814134
  },
  "output": "https://replicate.delivery/pbxt/5X2PhPfWNswOC6pHW0DYvIopFDc34itbCUn1HKHlOKeeeTOKB/out.mp4",
  "started_at": "2024-03-24T21:39:42.430732Z",
  "status": "succeeded",
  "urls": {
    "get": "https://api.replicate.com/v1/predictions/h4h3l7lba6y3zng2gg47ncbkve",
    "cancel": "https://api.replicate.com/v1/predictions/h4h3l7lba6y3zng2gg47ncbkve/cancel"
  },
  "version": "30adf8ca48b89945b7895eceeaa905ebe3034ca30060e6e727f5d7c9e9723962"
}

Generated in

65.4 seconds

Tweak itReport View full prediction

Using seed: 32867
  0%|          | 0/100 [00:00<?, ?it/s]
  1%|          | 1/100 [00:00<00:13,  7.08it/s]
  4%|▍         | 4/100 [00:00<00:05, 17.69it/s]
  7%|▋         | 7/100 [00:00<00:04, 21.80it/s]
 10%|█         | 10/100 [00:00<00:03, 23.84it/s]
 13%|█▎        | 13/100 [00:00<00:03, 25.05it/s]
 16%|█▌        | 16/100 [00:00<00:03, 25.82it/s]
 19%|█▉        | 19/100 [00:00<00:03, 26.32it/s]
 22%|██▏       | 22/100 [00:00<00:02, 26.64it/s]
 25%|██▌       | 25/100 [00:01<00:02, 26.83it/s]
 28%|██▊       | 28/100 [00:01<00:02, 26.98it/s]
 31%|███       | 31/100 [00:01<00:02, 27.10it/s]
 34%|███▍      | 34/100 [00:01<00:02, 27.16it/s]
 37%|███▋      | 37/100 [00:01<00:02, 27.20it/s]
 40%|████      | 40/100 [00:01<00:02, 27.23it/s]
 43%|████▎     | 43/100 [00:01<00:02, 27.26it/s]
 46%|████▌     | 46/100 [00:01<00:01, 27.30it/s]
 49%|████▉     | 49/100 [00:01<00:01, 27.32it/s]
 52%|█████▏    | 52/100 [00:02<00:01, 27.34it/s]
 55%|█████▌    | 55/100 [00:02<00:01, 27.35it/s]
 58%|█████▊    | 58/100 [00:02<00:01, 27.35it/s]
 61%|██████    | 61/100 [00:02<00:01, 27.32it/s]
 64%|██████▍   | 64/100 [00:02<00:01, 27.32it/s]
 67%|██████▋   | 67/100 [00:02<00:01, 27.31it/s]
 70%|███████   | 70/100 [00:02<00:01, 27.30it/s]
 73%|███████▎  | 73/100 [00:02<00:00, 27.32it/s]
 76%|███████▌  | 76/100 [00:02<00:00, 27.33it/s]
 79%|███████▉  | 79/100 [00:02<00:00, 27.35it/s]
 82%|████████▏ | 82/100 [00:03<00:00, 27.35it/s]
 85%|████████▌ | 85/100 [00:03<00:00, 27.36it/s]
 88%|████████▊ | 88/100 [00:03<00:00, 27.35it/s]
 91%|█████████ | 91/100 [00:03<00:00, 27.34it/s]
 94%|█████████▍| 94/100 [00:03<00:00, 27.34it/s]
 97%|█████████▋| 97/100 [00:03<00:00, 27.34it/s]
100%|██████████| 100/100 [00:03<00:00, 27.34it/s]
100%|██████████| 100/100 [00:03<00:00, 26.56it/s]
Processed and saved the first frame: exp_dir/edited_first_frame.png
Loading pipeline components...:   0%|          | 0/7 [00:00<?, ?it/s]
Loading pipeline components...:  29%|██▊       | 2/7 [00:00<00:00,  8.83it/s]The config attributes {'attention_head_dim': 64} were passed to I2VGenXLUNet, but are not expected and will be ignored. Please verify your config.json configuration file.
Loading pipeline components...:  57%|█████▋    | 4/7 [00:00<00:00,  6.27it/s]
Loading pipeline components...: 100%|██████████| 7/7 [00:00<00:00,  8.48it/s]
Loading pipeline components...: 100%|██████████| 7/7 [00:00<00:00,  8.09it/s]
  0%|          | 0/100 [00:00<?, ?it/s]
  1%|          | 1/100 [00:00<00:26,  3.75it/s]
  2%|▏         | 2/100 [00:00<00:24,  3.93it/s]
  3%|▎         | 3/100 [00:00<00:24,  4.00it/s]
  4%|▍         | 4/100 [00:01<00:23,  4.04it/s]
  5%|▌         | 5/100 [00:01<00:23,  4.06it/s]
  6%|▌         | 6/100 [00:01<00:23,  4.06it/s]
  7%|▋         | 7/100 [00:01<00:22,  4.07it/s]
  8%|▊         | 8/100 [00:01<00:22,  4.07it/s]
  9%|▉         | 9/100 [00:02<00:22,  4.07it/s]
 10%|█         | 10/100 [00:02<00:22,  4.07it/s]
 11%|█         | 11/100 [00:02<00:21,  4.07it/s]
 12%|█▏        | 12/100 [00:02<00:21,  4.07it/s]
 13%|█▎        | 13/100 [00:03<00:21,  4.07it/s]
 14%|█▍        | 14/100 [00:03<00:21,  4.08it/s]
 15%|█▌        | 15/100 [00:03<00:20,  4.08it/s]
 16%|█▌        | 16/100 [00:03<00:20,  4.07it/s]
 17%|█▋        | 17/100 [00:04<00:20,  4.07it/s]
 18%|█▊        | 18/100 [00:04<00:20,  4.07it/s]
 19%|█▉        | 19/100 [00:04<00:19,  4.07it/s]
 20%|██        | 20/100 [00:04<00:19,  4.07it/s]
 21%|██        | 21/100 [00:05<00:19,  4.07it/s]
 22%|██▏       | 22/100 [00:05<00:19,  4.07it/s]
 23%|██▎       | 23/100 [00:05<00:18,  4.07it/s]
 24%|██▍       | 24/100 [00:05<00:18,  4.07it/s]
 25%|██▌       | 25/100 [00:06<00:18,  4.07it/s]
 26%|██▌       | 26/100 [00:06<00:18,  4.07it/s]
 27%|██▋       | 27/100 [00:06<00:17,  4.07it/s]
 28%|██▊       | 28/100 [00:06<00:17,  4.07it/s]
 29%|██▉       | 29/100 [00:07<00:17,  4.07it/s]
 30%|███       | 30/100 [00:07<00:17,  4.07it/s]
 31%|███       | 31/100 [00:07<00:16,  4.07it/s]
 32%|███▏      | 32/100 [00:07<00:16,  4.07it/s]
 33%|███▎      | 33/100 [00:08<00:16,  4.08it/s]
 34%|███▍      | 34/100 [00:08<00:16,  4.08it/s]
 35%|███▌      | 35/100 [00:08<00:15,  4.07it/s]
 36%|███▌      | 36/100 [00:08<00:15,  4.07it/s]
 37%|███▋      | 37/100 [00:09<00:15,  4.06it/s]
 38%|███▊      | 38/100 [00:09<00:15,  4.07it/s]
 39%|███▉      | 39/100 [00:09<00:15,  4.06it/s]
 40%|████      | 40/100 [00:09<00:14,  4.07it/s]
 41%|████      | 41/100 [00:10<00:14,  4.07it/s]
 42%|████▏     | 42/100 [00:10<00:14,  4.08it/s]
 43%|████▎     | 43/100 [00:10<00:14,  4.07it/s]
 44%|████▍     | 44/100 [00:10<00:13,  4.07it/s]
 45%|████▌     | 45/100 [00:11<00:13,  4.07it/s]
 46%|████▌     | 46/100 [00:11<00:13,  4.07it/s]
 47%|████▋     | 47/100 [00:11<00:13,  4.07it/s]
 48%|████▊     | 48/100 [00:11<00:12,  4.07it/s]
 49%|████▉     | 49/100 [00:12<00:12,  4.07it/s]
 50%|█████     | 50/100 [00:12<00:12,  4.08it/s]
 51%|█████     | 51/100 [00:12<00:12,  4.07it/s]
 52%|█████▏    | 52/100 [00:12<00:11,  4.07it/s]
 53%|█████▎    | 53/100 [00:13<00:11,  4.07it/s]
 54%|█████▍    | 54/100 [00:13<00:11,  4.07it/s]
 55%|█████▌    | 55/100 [00:13<00:11,  4.07it/s]
 56%|█████▌    | 56/100 [00:13<00:10,  4.08it/s]
 57%|█████▋    | 57/100 [00:14<00:10,  4.07it/s]
 58%|█████▊    | 58/100 [00:14<00:10,  4.07it/s]
 59%|█████▉    | 59/100 [00:14<00:10,  4.07it/s]
 60%|██████    | 60/100 [00:14<00:09,  4.06it/s]
 61%|██████    | 61/100 [00:15<00:09,  4.06it/s]
 62%|██████▏   | 62/100 [00:15<00:09,  4.06it/s]
 63%|██████▎   | 63/100 [00:15<00:09,  4.06it/s]
 64%|██████▍   | 64/100 [00:15<00:08,  4.06it/s]
 65%|██████▌   | 65/100 [00:15<00:08,  4.06it/s]
 66%|██████▌   | 66/100 [00:16<00:08,  4.06it/s]
 67%|██████▋   | 67/100 [00:16<00:08,  4.06it/s]
 68%|██████▊   | 68/100 [00:16<00:07,  4.06it/s]
 69%|██████▉   | 69/100 [00:16<00:07,  4.06it/s]
 70%|███████   | 70/100 [00:17<00:07,  4.07it/s]
 71%|███████   | 71/100 [00:17<00:07,  4.07it/s]
 72%|███████▏  | 72/100 [00:17<00:06,  4.06it/s]
 73%|███████▎  | 73/100 [00:17<00:06,  4.05it/s]
 74%|███████▍  | 74/100 [00:18<00:06,  4.06it/s]
 75%|███████▌  | 75/100 [00:18<00:06,  4.07it/s]
 76%|███████▌  | 76/100 [00:18<00:05,  4.07it/s]
 77%|███████▋  | 77/100 [00:18<00:05,  4.06it/s]
 78%|███████▊  | 78/100 [00:19<00:05,  4.07it/s]
 79%|███████▉  | 79/100 [00:19<00:05,  4.07it/s]
 80%|████████  | 80/100 [00:19<00:04,  4.05it/s]
 81%|████████  | 81/100 [00:19<00:04,  4.01it/s]
 82%|████████▏ | 82/100 [00:20<00:04,  4.03it/s]
 83%|████████▎ | 83/100 [00:20<00:04,  4.04it/s]
 84%|████████▍ | 84/100 [00:20<00:03,  4.05it/s]
 85%|████████▌ | 85/100 [00:20<00:03,  4.05it/s]
 86%|████████▌ | 86/100 [00:21<00:03,  4.05it/s]
 87%|████████▋ | 87/100 [00:21<00:03,  4.05it/s]
 88%|████████▊ | 88/100 [00:21<00:02,  4.06it/s]
 89%|████████▉ | 89/100 [00:21<00:02,  4.06it/s]
 90%|█████████ | 90/100 [00:22<00:02,  4.07it/s]
 91%|█████████ | 91/100 [00:22<00:02,  4.07it/s]
 92%|█████████▏| 92/100 [00:22<00:01,  4.07it/s]
 93%|█████████▎| 93/100 [00:22<00:01,  4.06it/s]
 94%|█████████▍| 94/100 [00:23<00:01,  4.06it/s]
 95%|█████████▌| 95/100 [00:23<00:01,  4.06it/s]
 96%|█████████▌| 96/100 [00:23<00:00,  4.06it/s]
 97%|█████████▋| 97/100 [00:23<00:00,  4.06it/s]
 98%|█████████▊| 98/100 [00:24<00:00,  4.06it/s]
 99%|█████████▉| 99/100 [00:24<00:00,  4.06it/s]
100%|██████████| 100/100 [00:24<00:00,  4.06it/s]
100%|██████████| 100/100 [00:24<00:00,  4.06it/s]
ddim_scheduler.timesteps: tensor([981, 961, 941, 921, 901, 881, 861, 841, 821, 801, 781, 761, 741, 721,
701, 681, 661, 641, 621, 601, 581, 561, 541, 521, 501, 481, 461, 441,
421, 401, 381, 361, 341, 321, 301, 281, 261, 241, 221, 201, 181, 161,
141, 121, 101,  81,  61,  41,  21,   1])
ddim_scheduler.timesteps[t_idx]: 981
ddim_latents_at_t.shape: torch.Size([1, 4, 16, 64, 64])
Blending random_ratio (1 means random latent): 0.0
  0%|          | 0/50 [00:00<?, ?it/s]
  2%|▏         | 1/50 [00:00<00:31,  1.54it/s]
  4%|▍         | 2/50 [00:01<00:30,  1.56it/s]
  6%|▌         | 3/50 [00:01<00:29,  1.57it/s]
  8%|▊         | 4/50 [00:02<00:29,  1.58it/s]
 10%|█         | 5/50 [00:03<00:28,  1.58it/s]
 12%|█▏        | 6/50 [00:03<00:27,  1.58it/s]
 14%|█▍        | 7/50 [00:04<00:27,  1.58it/s]
 16%|█▌        | 8/50 [00:05<00:26,  1.58it/s]
 18%|█▊        | 9/50 [00:05<00:25,  1.58it/s]
 20%|██        | 10/50 [00:06<00:25,  1.58it/s]
 22%|██▏       | 11/50 [00:06<00:24,  1.58it/s]
 24%|██▍       | 12/50 [00:07<00:24,  1.58it/s]
 26%|██▌       | 13/50 [00:08<00:23,  1.58it/s]
 28%|██▊       | 14/50 [00:08<00:22,  1.58it/s]
 30%|███       | 15/50 [00:09<00:22,  1.58it/s]
 32%|███▏      | 16/50 [00:10<00:21,  1.58it/s]
 34%|███▍      | 17/50 [00:10<00:20,  1.58it/s]
 36%|███▌      | 18/50 [00:11<00:20,  1.58it/s]
 38%|███▊      | 19/50 [00:12<00:19,  1.58it/s]
 40%|████      | 20/50 [00:12<00:18,  1.58it/s]
 42%|████▏     | 21/50 [00:13<00:18,  1.58it/s]
 44%|████▍     | 22/50 [00:13<00:17,  1.58it/s]
 46%|████▌     | 23/50 [00:14<00:17,  1.58it/s]
 48%|████▊     | 24/50 [00:15<00:16,  1.58it/s]
 50%|█████     | 25/50 [00:15<00:15,  1.58it/s]
 52%|█████▏    | 26/50 [00:16<00:15,  1.58it/s]
 54%|█████▍    | 27/50 [00:17<00:14,  1.58it/s]
 56%|█████▌    | 28/50 [00:17<00:13,  1.58it/s]
 58%|█████▊    | 29/50 [00:18<00:13,  1.58it/s]
 60%|██████    | 30/50 [00:19<00:12,  1.58it/s]
 62%|██████▏   | 31/50 [00:19<00:12,  1.58it/s]
 64%|██████▍   | 32/50 [00:20<00:11,  1.58it/s]
 66%|██████▌   | 33/50 [00:20<00:10,  1.58it/s]
 68%|██████▊   | 34/50 [00:21<00:10,  1.58it/s]
 70%|███████   | 35/50 [00:22<00:09,  1.58it/s]
 72%|███████▏  | 36/50 [00:22<00:08,  1.58it/s]
 74%|███████▍  | 37/50 [00:23<00:08,  1.58it/s]
 76%|███████▌  | 38/50 [00:24<00:07,  1.58it/s]
 78%|███████▊  | 39/50 [00:24<00:06,  1.58it/s]
 80%|████████  | 40/50 [00:25<00:06,  1.58it/s]
 82%|████████▏ | 41/50 [00:25<00:05,  1.57it/s]
 84%|████████▍ | 42/50 [00:26<00:05,  1.58it/s]
 86%|████████▌ | 43/50 [00:27<00:04,  1.58it/s]
 88%|████████▊ | 44/50 [00:27<00:03,  1.58it/s]
 90%|█████████ | 45/50 [00:28<00:03,  1.58it/s]
 92%|█████████▏| 46/50 [00:29<00:02,  1.58it/s]
 94%|█████████▍| 47/50 [00:29<00:01,  1.58it/s]
 96%|█████████▌| 48/50 [00:30<00:01,  1.58it/s]
 98%|█████████▊| 49/50 [00:31<00:00,  1.58it/s]
100%|██████████| 50/50 [00:31<00:00,  1.58it/s]
100%|██████████| 50/50 [00:31<00:00,  1.58it/s]

This output was created using a different version of the model, tiger-ai-lab/anyv2v:30adf8ca.

Examples

View more examples

Run time and cost

This model runs on Nvidia L40S GPU hardware. We don't yet have enough runs of this model to provide performance information.

Readme

AnyV2V

Introduction

AnyV2V is a tuning-free framework to achieve high appearance and temporal consistency in video editing. - can seamlessly build on top of advanced image editing methods to perform diverse types of editing - robust performance on the four tasks: - prompt-based editing - reference-based style transfer - subject-driven editing - identity manipulation

🖊️ Citation

Please kindly cite our paper if you use our code, data, models or results: ```bibtex @article{ku2024anyv2v, title={AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks}, author={Ku, Max and Wei, Cong and Ren, Weiming and Yang, Huan and Chen, Wenhu}, journal={arXiv preprint arXiv:2403.14468}, year={2024} }