cjwbw / depth-anything

Highly practical solution for robust monocular depth estimation by training on a combination of 1.5M labeled images and 62M+ unlabeled images

  • Public
  • 4.6K runs
  • GitHub
  • Paper
  • License

Run time and cost

This model costs approximately $0.060 to run on Replicate, or 16 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia L40S GPU hardware. Predictions typically complete within 62 seconds. The predict time for this model varies significantly based on the inputs.

Readme

Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

This work presents Depth Anything, a highly practical solution for robust monocular depth estimation by training on a combination of 1.5M labeled images and 62M+ unlabeled images.

teaser

Features of Depth Anything

  • Relative depth estimation:

    Our foundation models listed here can provide relative depth estimation for any given image robustly.

  • Metric depth estimation

    We fine-tune our Depth Anything model with metric depth information from NYUv2 or KITTI. It offers strong capabilities of both in-domain and zero-shot metric depth estimation.

  • Better depth-conditioned ControlNet

    We re-train a better depth-conditioned ControlNet based on Depth Anything. It offers more precise synthesis than the previous MiDaS-based ControlNet.

  • Downstream high-level scene understanding

    The Depth Anything encoder can be fine-tuned to downstream high-level perception tasks, e.g., semantic segmentation, 86.2 mIoU on Cityscapes and 59.4 mIoU on ADE20K.

Citation

If you find this project useful, please consider citing:

@article{depthanything,
      title={Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data}, 
      author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang},
      journal={arXiv:2401.10891},
      year={2024}
}