lucataco/depth-anything-video | Run with an API on Replicate

lucataco / depth-anything-video

Depth Anything on full video files

Public
458 runs
T4
GitHub
Paper
License

Iterate in playground

Run with an API

Playground API Examples README Versions

Input

Run this model in Node.js with one line of code:

npx create-replicate --model=lucataco/depth-anything-video

or set up a project from scratch

Install Replicate’s Node.js client library:

npm install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import and set up the client:

import Replicate from "replicate";
import fs from "node:fs";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run lucataco/depth-anything-video using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "lucataco/depth-anything-video:9143691405afc64c7952499a1e81e3f779535a8916c8da7154a9995f145d5e6d",
  {
    input: {
      video: "https://replicate.delivery/pbxt/KNKNfiWzJU5YxaSIsrJKBllp1TgjtvQ9urLNlN1biczFBPpe/dolphins.mp4",
      encoder: "vits"
    }
  }
);

// To access the file URL:
console.log(output.url()); //=> "http://example.com"

// To write the file to disk:
fs.writeFile("my-image.png", output);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import the client:

import replicate

Run lucataco/depth-anything-video using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "lucataco/depth-anything-video:9143691405afc64c7952499a1e81e3f779535a8916c8da7154a9995f145d5e6d",
    input={
        "video": "https://replicate.delivery/pbxt/KNKNfiWzJU5YxaSIsrJKBllp1TgjtvQ9urLNlN1biczFBPpe/dolphins.mp4",
        "encoder": "vits"
    }
)
print(output)

To learn more, take a look at the guide on getting started with Python.

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Run lucataco/depth-anything-video using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "lucataco/depth-anything-video:9143691405afc64c7952499a1e81e3f779535a8916c8da7154a9995f145d5e6d",
    "input": {
      "video": "https://replicate.delivery/pbxt/KNKNfiWzJU5YxaSIsrJKBllp1TgjtvQ9urLNlN1biczFBPpe/dolphins.mp4",
      "encoder": "vits"
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

Output

{
  "completed_at": "2024-02-09T19:32:37.056153Z",
  "created_at": "2024-02-09T19:32:19.403468Z",
  "data_removed": false,
  "error": null,
  "id": "p7dd25rbq6envjxwd4pvynqsna",
  "input": {
    "video": "https://replicate.delivery/pbxt/KNKNfiWzJU5YxaSIsrJKBllp1TgjtvQ9urLNlN1biczFBPpe/dolphins.mp4",
    "encoder": "vits"
  },
  "logs": "Processing /tmp/tmp8libbdowdolphins.mp4\n/root/.pyenv/versions/3.11.7/lib/python3.11/site-packages/transformers/pipelines/base.py:1123: UserWarning: You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset\nwarnings.warn(\nffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg developers\nbuilt with gcc 11 (Ubuntu 11.2.0-19ubuntu1)\nconfiguration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared\nlibavutil      56. 70.100 / 56. 70.100\nlibavcodec     58.134.100 / 58.134.100\nlibavformat    58. 76.100 / 58. 76.100\nlibavdevice    58. 13.100 / 58. 13.100\nlibavfilter     7.110.100 /  7.110.100\nlibswscale      5.  9.100 /  5.  9.100\nlibswresample   3.  9.100 /  3.  9.100\nlibpostproc    55.  9.100 / 55.  9.100\nInput #0, image2, from '/tmp/frames/depth-%d.png':\nDuration: 00:00:02.36, start: 0.000000, bitrate: N/A\nStream #0:0: Video: png, rgb24(pc), 960x540, 25 fps, 25 tbr, 25 tbn, 25 tbc\nStream mapping:\nStream #0:0 -> #0:0 (png (native) -> h264 (libx264))\nPress [q] to stop, [?] for help\n[libx264 @ 0x5a7ad75f8400] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2\n[libx264 @ 0x5a7ad75f8400] profile High, level 3.1, 4:2:0, 8-bit\n[libx264 @ 0x5a7ad75f8400] 264 - core 163 r3060 5db6aa6 - H.264/MPEG-4 AVC codec - Copyleft 2003-2021 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=12 lookahead_threads=2 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=24 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=25.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00\nOutput #0, mp4, to '/tmp/output.mp4':\nMetadata:\nencoder         : Lavf58.76.100\nStream #0:0: Video: h264 (avc1 / 0x31637661), yuv420p(tv, progressive), 960x540, q=2-31, 24 fps, 12288 tbn\nMetadata:\nencoder         : Lavc58.134.100 libx264\nSide data:\ncpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: N/A\nframe=    1 fps=0.0 q=0.0 size=       0kB time=00:00:00.00 bitrate=N/A speed=   0x\nframe=   27 fps=0.0 q=0.0 size=       0kB time=00:00:00.00 bitrate=N/A speed=   0x\nframe=   59 fps= 51 q=30.0 size=       0kB time=00:00:00.00 bitrate=4740.7kbits/s speed=6.98e-05x\nframe=   59 fps= 35 q=-1.0 Lsize=     332kB time=00:00:02.33 bitrate=1167.3kbits/s speed=1.39x\nvideo:331kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.455267%\n[libx264 @ 0x5a7ad75f8400] frame I:1     Avg QP:24.65  size: 11663\n[libx264 @ 0x5a7ad75f8400] frame P:20    Avg QP:27.02  size:  7248\n[libx264 @ 0x5a7ad75f8400] frame B:38    Avg QP:28.37  size:  4779\n[libx264 @ 0x5a7ad75f8400] consecutive B-frames:  8.5% 13.6% 10.2% 67.8%\n[libx264 @ 0x5a7ad75f8400] mb I  I16..4: 30.2% 59.3% 10.5%\n[libx264 @ 0x5a7ad75f8400] mb P  I16..4: 18.4% 17.8%  2.9%  P16..4: 20.2%  5.7%  1.5%  0.0%  0.0%    skip:33.5%\n[libx264 @ 0x5a7ad75f8400] mb B  I16..4:  5.5%  4.6%  0.6%  B16..8: 29.4%  7.3%  1.1%  direct: 1.6%  skip:50.0%  L0:50.4% L1:45.0% BI: 4.6%\n[libx264 @ 0x5a7ad75f8400] 8x8 transform intra:45.7% inter:73.0%\n[libx264 @ 0x5a7ad75f8400] coded y,uvDC,uvAC intra: 13.0% 54.6% 17.0% inter: 7.8% 14.0% 1.9%\n[libx264 @ 0x5a7ad75f8400] i16 v,h,dc,p:  9% 44%  5% 42%\n[libx264 @ 0x5a7ad75f8400] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 16% 29% 38%  3%  3%  2%  5%  1%  3%\n[libx264 @ 0x5a7ad75f8400] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 14% 27% 38%  3%  3%  2%  6%  2%  4%\n[libx264 @ 0x5a7ad75f8400] i8c dc,h,v,p: 35% 41%  9% 15%\n[libx264 @ 0x5a7ad75f8400] Weighted P-Frames: Y:0.0% UV:0.0%\n[libx264 @ 0x5a7ad75f8400] ref P L0: 56.7%  7.0% 24.7% 11.7%\n[libx264 @ 0x5a7ad75f8400] ref B L0: 80.3% 16.2%  3.5%\n[libx264 @ 0x5a7ad75f8400] ref B L1: 92.4%  7.6%\n[libx264 @ 0x5a7ad75f8400] kb/s:1100.69",
  "metrics": {
    "predict_time": 17.636859,
    "total_time": 17.652685
  },
  "output": "https://replicate.delivery/pbxt/cbA6jk39GnJ5F9a70DTRhdFc30ZFNT89iBIREfMuBgWqfCVSA/output.mp4",
  "started_at": "2024-02-09T19:32:19.419294Z",
  "status": "succeeded",
  "urls": {
    "get": "https://api.replicate.com/v1/predictions/p7dd25rbq6envjxwd4pvynqsna",
    "cancel": "https://api.replicate.com/v1/predictions/p7dd25rbq6envjxwd4pvynqsna/cancel"
  },
  "version": "9143691405afc64c7952499a1e81e3f779535a8916c8da7154a9995f145d5e6d"
}

Generated in

17.6 seconds

Tweak itReport View full prediction

Processing /tmp/tmp8libbdowdolphins.mp4
/root/.pyenv/versions/3.11.7/lib/python3.11/site-packages/transformers/pipelines/base.py:1123: UserWarning: You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset
warnings.warn(
ffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg developers
built with gcc 11 (Ubuntu 11.2.0-19ubuntu1)
configuration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
libavutil      56. 70.100 / 56. 70.100
libavcodec     58.134.100 / 58.134.100
libavformat    58. 76.100 / 58. 76.100
libavdevice    58. 13.100 / 58. 13.100
libavfilter     7.110.100 /  7.110.100
libswscale      5.  9.100 /  5.  9.100
libswresample   3.  9.100 /  3.  9.100
libpostproc    55.  9.100 / 55.  9.100
Input #0, image2, from '/tmp/frames/depth-%d.png':
Duration: 00:00:02.36, start: 0.000000, bitrate: N/A
Stream #0:0: Video: png, rgb24(pc), 960x540, 25 fps, 25 tbr, 25 tbn, 25 tbc
Stream mapping:
Stream #0:0 -> #0:0 (png (native) -> h264 (libx264))
Press [q] to stop, [?] for help
[libx264 @ 0x5a7ad75f8400] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
[libx264 @ 0x5a7ad75f8400] profile High, level 3.1, 4:2:0, 8-bit
[libx264 @ 0x5a7ad75f8400] 264 - core 163 r3060 5db6aa6 - H.264/MPEG-4 AVC codec - Copyleft 2003-2021 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=12 lookahead_threads=2 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=24 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=25.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00
Output #0, mp4, to '/tmp/output.mp4':
Metadata:
encoder         : Lavf58.76.100
Stream #0:0: Video: h264 (avc1 / 0x31637661), yuv420p(tv, progressive), 960x540, q=2-31, 24 fps, 12288 tbn
Metadata:
encoder         : Lavc58.134.100 libx264
Side data:
cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: N/A
frame=    1 fps=0.0 q=0.0 size=       0kB time=00:00:00.00 bitrate=N/A speed=   0x
frame=   27 fps=0.0 q=0.0 size=       0kB time=00:00:00.00 bitrate=N/A speed=   0x
frame=   59 fps= 51 q=30.0 size=       0kB time=00:00:00.00 bitrate=4740.7kbits/s speed=6.98e-05x
frame=   59 fps= 35 q=-1.0 Lsize=     332kB time=00:00:02.33 bitrate=1167.3kbits/s speed=1.39x
video:331kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.455267%
[libx264 @ 0x5a7ad75f8400] frame I:1     Avg QP:24.65  size: 11663
[libx264 @ 0x5a7ad75f8400] frame P:20    Avg QP:27.02  size:  7248
[libx264 @ 0x5a7ad75f8400] frame B:38    Avg QP:28.37  size:  4779
[libx264 @ 0x5a7ad75f8400] consecutive B-frames:  8.5% 13.6% 10.2% 67.8%
[libx264 @ 0x5a7ad75f8400] mb I  I16..4: 30.2% 59.3% 10.5%
[libx264 @ 0x5a7ad75f8400] mb P  I16..4: 18.4% 17.8%  2.9%  P16..4: 20.2%  5.7%  1.5%  0.0%  0.0%    skip:33.5%
[libx264 @ 0x5a7ad75f8400] mb B  I16..4:  5.5%  4.6%  0.6%  B16..8: 29.4%  7.3%  1.1%  direct: 1.6%  skip:50.0%  L0:50.4% L1:45.0% BI: 4.6%
[libx264 @ 0x5a7ad75f8400] 8x8 transform intra:45.7% inter:73.0%
[libx264 @ 0x5a7ad75f8400] coded y,uvDC,uvAC intra: 13.0% 54.6% 17.0% inter: 7.8% 14.0% 1.9%
[libx264 @ 0x5a7ad75f8400] i16 v,h,dc,p:  9% 44%  5% 42%
[libx264 @ 0x5a7ad75f8400] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 16% 29% 38%  3%  3%  2%  5%  1%  3%
[libx264 @ 0x5a7ad75f8400] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 14% 27% 38%  3%  3%  2%  6%  2%  4%
[libx264 @ 0x5a7ad75f8400] i8c dc,h,v,p: 35% 41%  9% 15%
[libx264 @ 0x5a7ad75f8400] Weighted P-Frames: Y:0.0% UV:0.0%
[libx264 @ 0x5a7ad75f8400] ref P L0: 56.7%  7.0% 24.7% 11.7%
[libx264 @ 0x5a7ad75f8400] ref B L0: 80.3% 16.2%  3.5%
[libx264 @ 0x5a7ad75f8400] ref B L1: 92.4%  7.6%
[libx264 @ 0x5a7ad75f8400] kb/s:1100.69

Examples

View more examples

Run time and cost

This model costs approximately $0.051 to run on Replicate, or 19 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia T4 GPU hardware. Predictions typically complete within 4 minutes. The predict time for this model varies significantly based on the inputs.

Readme

Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

This work presents Depth Anything, a highly practical solution for robust monocular depth estimation by training on a combination of 1.5M labeled images and 62M+ unlabeled images.

teaser

Features of Depth Anything

Relative depth estimation:

Our foundation models listed here can provide relative depth estimation for any given image robustly.
Metric depth estimation

We fine-tune our Depth Anything model with metric depth information from NYUv2 or KITTI. It offers strong capabilities of both in-domain and zero-shot metric depth estimation.
Better depth-conditioned ControlNet

We re-train a better depth-conditioned ControlNet based on Depth Anything. It offers more precise synthesis than the previous MiDaS-based ControlNet.
Downstream high-level scene understanding

The Depth Anything encoder can be fine-tuned to downstream high-level perception tasks, e.g., semantic segmentation, 86.2 mIoU on Cityscapes and 59.4 mIoU on ADE20K.

Citation

If you find this project useful, please consider citing:

@article{depthanything,
      title={Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data}, 
      author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang},
      journal={arXiv:2401.10891},
      year={2024}
}