chenxwh / depth-any-video

Depth Any Video with Scalable Synthetic Data (Updated 7 months, 4 weeks ago)

Cold

Public
162 runs
A100 (80GB)
GitHub
Paper
License

Iterate in playground

Run with an API

Playground API Examples README Versions

Input

Run this model in Node.js with one line of code:

npx create-replicate --model=chenxwh/depth-any-video

or set up a project from scratch

Install Replicate’s Node.js client library:

npm install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import and set up the client:

import Replicate from "replicate";
import fs from "node:fs";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run chenxwh/depth-any-video using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "chenxwh/depth-any-video:4055afe2aba32ab3d2dbccc4fabb240bf643db7fe56f33c1f9535d948b219495",
  {
    input: {
      num_frames: 32,
      denoise_steps: 3,
      input_is_video: true,
      max_resolution: 960,
      decode_chunk_size: 16,
      num_interp_frames: 16,
      num_overlap_frames: 6,
      input_image_or_video: "https://replicate.delivery/pbxt/LpLOdVyL2oJaitdfNvGpMvJI0cqfmrGKrclST5AmbNUN4LaV/wooly_mammoth.mp4"
    }
  }
);

// To access the file URL:
console.log(output.url()); //=> "http://example.com"

// To write the file to disk:
fs.writeFile("my-image.png", output);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import the client:

import replicate

Run chenxwh/depth-any-video using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "chenxwh/depth-any-video:4055afe2aba32ab3d2dbccc4fabb240bf643db7fe56f33c1f9535d948b219495",
    input={
        "num_frames": 32,
        "denoise_steps": 3,
        "input_is_video": True,
        "max_resolution": 960,
        "decode_chunk_size": 16,
        "num_interp_frames": 16,
        "num_overlap_frames": 6,
        "input_image_or_video": "https://replicate.delivery/pbxt/LpLOdVyL2oJaitdfNvGpMvJI0cqfmrGKrclST5AmbNUN4LaV/wooly_mammoth.mp4"
    }
)
print(output)

To learn more, take a look at the guide on getting started with Python.

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Run chenxwh/depth-any-video using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "chenxwh/depth-any-video:4055afe2aba32ab3d2dbccc4fabb240bf643db7fe56f33c1f9535d948b219495",
    "input": {
      "num_frames": 32,
      "denoise_steps": 3,
      "input_is_video": true,
      "max_resolution": 960,
      "decode_chunk_size": 16,
      "num_interp_frames": 16,
      "num_overlap_frames": 6,
      "input_image_or_video": "https://replicate.delivery/pbxt/LpLOdVyL2oJaitdfNvGpMvJI0cqfmrGKrclST5AmbNUN4LaV/wooly_mammoth.mp4"
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

Output

{
  "completed_at": "2024-10-20T09:56:43.025525Z",
  "created_at": "2024-10-20T09:54:09.563000Z",
  "data_removed": false,
  "error": null,
  "id": "08cje69r3drj00cjn5cr65thac",
  "input": {
    "num_frames": 32,
    "denoise_steps": 3,
    "input_is_video": true,
    "max_resolution": 960,
    "decode_chunk_size": 16,
    "num_interp_frames": 16,
    "num_overlap_frames": 6,
    "input_image_or_video": "https://replicate.delivery/pbxt/LpLOdVyL2oJaitdfNvGpMvJI0cqfmrGKrclST5AmbNUN4LaV/wooly_mammoth.mp4"
  },
  "logs": "Using seed: 60885\n  0%|          | 0/15 [00:00<?, ?it/s]\n  7%|▋         | 1/15 [00:02<00:30,  2.20s/it]\n 13%|█▎        | 2/15 [00:04<00:28,  2.19s/it]\n 20%|██        | 3/15 [00:06<00:26,  2.19s/it]\n 27%|██▋       | 4/15 [00:08<00:24,  2.19s/it]\n 33%|███▎      | 5/15 [00:10<00:21,  2.19s/it]\n 40%|████      | 6/15 [00:13<00:19,  2.19s/it]\n 47%|████▋     | 7/15 [00:15<00:17,  2.19s/it]\n 53%|█████▎    | 8/15 [00:17<00:15,  2.19s/it]\n 60%|██████    | 9/15 [00:19<00:13,  2.19s/it]\n 67%|██████▋   | 10/15 [00:21<00:10,  2.19s/it]\n 73%|███████▎  | 11/15 [00:24<00:08,  2.19s/it]\n 80%|████████  | 12/15 [00:26<00:06,  2.20s/it]\n 87%|████████▋ | 13/15 [00:28<00:04,  2.20s/it]\n 93%|█████████▎| 14/15 [00:30<00:02,  2.20s/it]\n100%|██████████| 15/15 [00:32<00:00,  2.19s/it]\n100%|██████████| 15/15 [00:32<00:00,  2.19s/it]\nffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg developers\nbuilt with gcc 11 (Ubuntu 11.2.0-19ubuntu1)\nconfiguration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared\nlibavutil      56. 70.100 / 56. 70.100\nlibavcodec     58.134.100 / 58.134.100\nlibavformat    58. 76.100 / 58. 76.100\nlibavdevice    58. 13.100 / 58. 13.100\nlibavfilter     7.110.100 /  7.110.100\nlibswscale      5.  9.100 /  5.  9.100\nlibswresample   3.  9.100 /  3.  9.100\nlibpostproc    55.  9.100 / 55.  9.100\nInput #0, image2, from '/tmp/tmp/%06d.png':\nDuration: 00:00:06.13, start: 0.000000, bitrate: N/A\nStream #0:0: Video: png, rgb24(pc), 960x512, 30 fps, 30 tbr, 30 tbn, 30 tbc\nStream mapping:\nStream #0:0 -> #0:0 (png (native) -> h264 (libx264))\nPress [q] to stop, [?] for help\n[libx264 @ 0x581d7458cfc0] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2 AVX512\n[libx264 @ 0x581d7458cfc0] profile High 4:4:4 Predictive, level 3.1, 4:4:4, 8-bit\n[libx264 @ 0x581d7458cfc0] 264 - core 163 r3060 5db6aa6 - H.264/MPEG-4 AVC codec - Copyleft 2003-2021 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=4 threads=15 lookahead_threads=2 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=abr mbtree=1 bitrate=5626 ratetol=1.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00\nOutput #0, mp4, to '/tmp/out.mp4':\nMetadata:\nencoder         : Lavf58.76.100\nStream #0:0: Video: h264 (avc1 / 0x31637661), yuv444p(tv, progressive), 960x512, q=2-31, 5626 kb/s, 30 fps, 15360 tbn\nMetadata:\nencoder         : Lavc58.134.100 libx264\nSide data:\ncpb: bitrate max/min/avg: 0/0/5626000 buffer size: 0 vbv_delay: N/A\nframe=    1 fps=0.0 q=0.0 size=       0kB time=00:00:00.00 bitrate=N/A speed=   0x\nframe=  102 fps=0.0 q=11.0 size=    1024kB time=00:00:01.33 bitrate=6291.4kbits/s speed=2.59x\nframe=  179 fps=176 q=11.0 size=    2816kB time=00:00:03.90 bitrate=5915.0kbits/s speed=3.83x\nframe=  184 fps=134 q=-1.0 Lsize=    4457kB time=00:00:06.03 bitrate=6051.6kbits/s speed=4.38x\nvideo:4454kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.068497%\n[libx264 @ 0x581d7458cfc0] frame I:1     Avg QP: 7.93  size: 22042\n[libx264 @ 0x581d7458cfc0] frame P:46    Avg QP: 4.74  size: 46927\n[libx264 @ 0x581d7458cfc0] frame B:137   Avg QP: 6.83  size: 17368\n[libx264 @ 0x581d7458cfc0] consecutive B-frames:  0.5%  0.0%  1.6% 97.8%\n[libx264 @ 0x581d7458cfc0] mb I  I16..4: 50.5% 30.8% 18.7%\n[libx264 @ 0x581d7458cfc0] mb P  I16..4:  2.7%  5.6%  4.3%  P16..4: 13.5% 17.8% 19.2%  0.0%  0.0%    skip:36.7%\n[libx264 @ 0x581d7458cfc0] mb B  I16..4:  0.4%  0.6%  0.3%  B16..8: 20.5% 11.4%  7.0%  direct: 8.7%  skip:51.2%  L0:43.1% L1:42.5% BI:14.3%\n[libx264 @ 0x581d7458cfc0] final ratefactor: 5.11\n[libx264 @ 0x581d7458cfc0] 8x8 transform intra:43.0% inter:32.8%\n[libx264 @ 0x581d7458cfc0] coded y,u,v intra: 65.5% 31.4% 55.3% inter: 25.0% 10.1% 16.5%\n[libx264 @ 0x581d7458cfc0] i16 v,h,dc,p: 49% 22% 15% 15%\n[libx264 @ 0x581d7458cfc0] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 14% 33% 42%  1%  2%  2%  2%  1%  3%\n[libx264 @ 0x581d7458cfc0] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 19% 46% 16%  2%  2%  2%  4%  2%  6%\n[libx264 @ 0x581d7458cfc0] Weighted P-Frames: Y:0.0% UV:0.0%\n[libx264 @ 0x581d7458cfc0] ref P L0: 55.4%  4.3% 22.3% 18.0%\n[libx264 @ 0x581d7458cfc0] ref B L0: 76.6% 18.5%  4.9%\n[libx264 @ 0x581d7458cfc0] ref B L1: 90.3%  9.7%\n[libx264 @ 0x581d7458cfc0] kb/s:5947.97",
  "metrics": {
    "predict_time": 85.376074935,
    "total_time": 153.462525
  },
  "output": "https://replicate.delivery/yhqm/vglbmt7wJAI3I10jf4xceHmKmQyDkVRmchWVA8qLNcrbXooTA/out.mp4",
  "started_at": "2024-10-20T09:55:17.649450Z",
  "status": "succeeded",
  "urls": {
    "get": "https://api.replicate.com/v1/predictions/08cje69r3drj00cjn5cr65thac",
    "cancel": "https://api.replicate.com/v1/predictions/08cje69r3drj00cjn5cr65thac/cancel"
  },
  "version": "4055afe2aba32ab3d2dbccc4fabb240bf643db7fe56f33c1f9535d948b219495"
}

Generated in

85.4 seconds

Tweak it ShareReport View full prediction

Using seed: 60885
  0%|          | 0/15 [00:00<?, ?it/s]
  7%|▋         | 1/15 [00:02<00:30,  2.20s/it]
 13%|█▎        | 2/15 [00:04<00:28,  2.19s/it]
 20%|██        | 3/15 [00:06<00:26,  2.19s/it]
 27%|██▋       | 4/15 [00:08<00:24,  2.19s/it]
 33%|███▎      | 5/15 [00:10<00:21,  2.19s/it]
 40%|████      | 6/15 [00:13<00:19,  2.19s/it]
 47%|████▋     | 7/15 [00:15<00:17,  2.19s/it]
 53%|█████▎    | 8/15 [00:17<00:15,  2.19s/it]
 60%|██████    | 9/15 [00:19<00:13,  2.19s/it]
 67%|██████▋   | 10/15 [00:21<00:10,  2.19s/it]
 73%|███████▎  | 11/15 [00:24<00:08,  2.19s/it]
 80%|████████  | 12/15 [00:26<00:06,  2.20s/it]
 87%|████████▋ | 13/15 [00:28<00:04,  2.20s/it]
 93%|█████████▎| 14/15 [00:30<00:02,  2.20s/it]
100%|██████████| 15/15 [00:32<00:00,  2.19s/it]
100%|██████████| 15/15 [00:32<00:00,  2.19s/it]
ffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg developers
built with gcc 11 (Ubuntu 11.2.0-19ubuntu1)
configuration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
libavutil      56. 70.100 / 56. 70.100
libavcodec     58.134.100 / 58.134.100
libavformat    58. 76.100 / 58. 76.100
libavdevice    58. 13.100 / 58. 13.100
libavfilter     7.110.100 /  7.110.100
libswscale      5.  9.100 /  5.  9.100
libswresample   3.  9.100 /  3.  9.100
libpostproc    55.  9.100 / 55.  9.100
Input #0, image2, from '/tmp/tmp/%06d.png':
Duration: 00:00:06.13, start: 0.000000, bitrate: N/A
Stream #0:0: Video: png, rgb24(pc), 960x512, 30 fps, 30 tbr, 30 tbn, 30 tbc
Stream mapping:
Stream #0:0 -> #0:0 (png (native) -> h264 (libx264))
Press [q] to stop, [?] for help
[libx264 @ 0x581d7458cfc0] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2 AVX512
[libx264 @ 0x581d7458cfc0] profile High 4:4:4 Predictive, level 3.1, 4:4:4, 8-bit
[libx264 @ 0x581d7458cfc0] 264 - core 163 r3060 5db6aa6 - H.264/MPEG-4 AVC codec - Copyleft 2003-2021 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=4 threads=15 lookahead_threads=2 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=abr mbtree=1 bitrate=5626 ratetol=1.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00
Output #0, mp4, to '/tmp/out.mp4':
Metadata:
encoder         : Lavf58.76.100
Stream #0:0: Video: h264 (avc1 / 0x31637661), yuv444p(tv, progressive), 960x512, q=2-31, 5626 kb/s, 30 fps, 15360 tbn
Metadata:
encoder         : Lavc58.134.100 libx264
Side data:
cpb: bitrate max/min/avg: 0/0/5626000 buffer size: 0 vbv_delay: N/A
frame=    1 fps=0.0 q=0.0 size=       0kB time=00:00:00.00 bitrate=N/A speed=   0x
frame=  102 fps=0.0 q=11.0 size=    1024kB time=00:00:01.33 bitrate=6291.4kbits/s speed=2.59x
frame=  179 fps=176 q=11.0 size=    2816kB time=00:00:03.90 bitrate=5915.0kbits/s speed=3.83x
frame=  184 fps=134 q=-1.0 Lsize=    4457kB time=00:00:06.03 bitrate=6051.6kbits/s speed=4.38x
video:4454kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.068497%
[libx264 @ 0x581d7458cfc0] frame I:1     Avg QP: 7.93  size: 22042
[libx264 @ 0x581d7458cfc0] frame P:46    Avg QP: 4.74  size: 46927
[libx264 @ 0x581d7458cfc0] frame B:137   Avg QP: 6.83  size: 17368
[libx264 @ 0x581d7458cfc0] consecutive B-frames:  0.5%  0.0%  1.6% 97.8%
[libx264 @ 0x581d7458cfc0] mb I  I16..4: 50.5% 30.8% 18.7%
[libx264 @ 0x581d7458cfc0] mb P  I16..4:  2.7%  5.6%  4.3%  P16..4: 13.5% 17.8% 19.2%  0.0%  0.0%    skip:36.7%
[libx264 @ 0x581d7458cfc0] mb B  I16..4:  0.4%  0.6%  0.3%  B16..8: 20.5% 11.4%  7.0%  direct: 8.7%  skip:51.2%  L0:43.1% L1:42.5% BI:14.3%
[libx264 @ 0x581d7458cfc0] final ratefactor: 5.11
[libx264 @ 0x581d7458cfc0] 8x8 transform intra:43.0% inter:32.8%
[libx264 @ 0x581d7458cfc0] coded y,u,v intra: 65.5% 31.4% 55.3% inter: 25.0% 10.1% 16.5%
[libx264 @ 0x581d7458cfc0] i16 v,h,dc,p: 49% 22% 15% 15%
[libx264 @ 0x581d7458cfc0] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 14% 33% 42%  1%  2%  2%  2%  1%  3%
[libx264 @ 0x581d7458cfc0] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 19% 46% 16%  2%  2%  2%  4%  2%  6%
[libx264 @ 0x581d7458cfc0] Weighted P-Frames: Y:0.0% UV:0.0%
[libx264 @ 0x581d7458cfc0] ref P L0: 55.4%  4.3% 22.3% 18.0%
[libx264 @ 0x581d7458cfc0] ref B L0: 76.6% 18.5%  4.9%
[libx264 @ 0x581d7458cfc0] ref B L1: 90.3%  9.7%
[libx264 @ 0x581d7458cfc0] kb/s:5947.97

Examples

View more examples

Run time and cost

This model runs on Nvidia A100 (80GB) GPU hardware. We don't yet have enough runs of this model to provide performance information.

Readme

Depth Any Video with Scalable Synthetic Data

Depth Any Video introduces a scalable synthetic data pipeline, capturing 40,000 video clips from diverse games, and leverages powerful priors of generative video diffusion models to advance video depth estimation. By incorporating rotary position encoding, flow matching, and a mixed-duration training strategy, it robustly handles varying video lengths and frame rates. Additionally, a novel depth interpolation method enables high-resolution depth inference, achieving superior spatial accuracy and temporal consistency over previous models.

Demos

Citation

If you find our work useful, please cite:

@article{yang2024depthanyvideo,
  author    = {Honghui Yang and Di Huang and Wei Yin and Chunhua Shen and Haifeng Liu and Xiaofei He and Binbin Lin and Wanli Ouyang and Tong He},
  title     = {Depth Any Video with Scalable Synthetic Data},
  journal   = {arXiv preprint arXiv:2410.10815},
  year      = {2024}
}