Readme

Depth Anything V3 — Metric Large

Estimate real-world depth in meters from a single image. Returns a 16-bit PNG depth map with metric scale metadata, ready for 3D reconstruction, parallax effects, robotics, or any application that needs actual distances (not just relative ordering).

Based on Depth Anything 3 (DA3METRIC-LARGE) by ByteDance. Apache 2.0 license.

Why this model?

Feature	This model	Typical DA3 wrappers
Output precision	16-bit PNG (65,535 levels)	8-bit (256 levels)
Metric depth	Real distances in meters with decode formula	Relative depth only
Inference resolution	Up to 1120px (reduces ViT grid artifacts)	Default 504px (visible grid)
EXIF handling	Auto-normalizes orientation before inference	Often ignored (rotated outputs)
Raw float32 output	Optional NPZ with full-precision metric depth	Not available
Stable API contract	Versioned output schema (`da3-metric/v1`)	Unversioned

Quick start

Python

import replicate

output = replicate.run(
    "wolfire/depth-anything-v3-metric-large",
    input={"image": "https://example.com/photo.jpg"},
)

# Output includes metric metadata
print(f"Depth range: {output['depth_min_m']:.2f}m — {output['depth_max_m']:.2f}m")
print(f"Depth map URL: {output['depth_png']}")

JavaScript

import Replicate from "replicate";

const replicate = new Replicate();
const output = await replicate.run("wolfire/depth-anything-v3-metric-large", {
  input: { image: "https://example.com/photo.jpg" },
});

console.log(`Depth range: ${output.depth_min_m}m — ${output.depth_max_m}m`);
console.log(`Depth map: ${output.depth_png}`);

cURL

curl -sS -X POST "https://api.replicate.com/v1/models/wolfire/depth-anything-v3-metric-large/predictions" \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"input": {"image": "https://example.com/photo.jpg"}}'

Decoding the 16-bit depth map

The output depth_png is a 16-bit grayscale PNG. Convert pixel values back to meters using the scale_m_per_unit and offset_m fields from the output:

import numpy as np
from PIL import Image

arr = np.array(Image.open("depth.png"), dtype=np.uint16)
depth_meters = arr.astype(np.float32) * output["scale_m_per_unit"] + output["offset_m"]

The depth_min_m and depth_max_m fields give the 1st/99th percentile depth range used for encoding, so you know the effective metric bounds without decoding.

Inputs

Parameter	Type	Default	Description
`image`	file/URL	(required)	Input image
`focal_length_px`	float	`0`	Focal length in pixels. `0` = auto-estimate using 60-degree HFOV heuristic
`max_process_res`	int	`0`	Cap inference resolution (0 = server default of 1120px). Higher = sharper depth but more VRAM
`return_raw_depth`	bool	`false`	Include `depth_npz` with float32 metric depth (no quantization loss)
`include_base64`	bool	`true`	Include `depth_png_base64` for inline transport (no extra download)

Outputs

Field	Description
`depth_png`	16-bit grayscale PNG depth map (hosted URL)
`image`	Alias of `depth_png` (for Replicate UI preview cards)
`depth_npz`	Float32 metric depth in NPZ format (when `return_raw_depth=true`)
`depth_png_base64`	Base64-encoded 16-bit PNG (when `include_base64=true`)
`depth_min_m`	Near depth bound in meters (1st percentile)
`depth_max_m`	Far depth bound in meters (99th percentile)
`scale_m_per_unit`	Multiply u16 pixel value by this to get meters
`offset_m`	Add this after scaling to get meters
`focal_length_used`	Focal length used for metric conversion (pixels)
`process_res_used`	Actual inference resolution used
`contract_version`	Output schema version (`da3-metric/v1`)
`model_name`	`DA3METRIC-LARGE`
`model_ref`	HuggingFace model reference
`model_commit`	Pinned DA3 source commit for reproducibility

Use cases

3D photo effects / parallax — generate depth-based parallax from a single photo
Robotics & SLAM — metric depth for obstacle avoidance and mapping
AR/VR content — depth-aware compositing and occlusion
Visual effects — depth-of-field, fog, and atmospheric perspective
Point cloud generation — combine with camera intrinsics for 3D reconstruction

License

Apache 2.0 — the underlying DA3 model and this wrapper are both Apache-licensed.

Citation

@article{depth_anything_3,
  title={Depth Anything 3: Recovering the Visual Space from Any Views},
  author={Lin, Haotong and Chen, Yilun and Liew, Jun Hao and Luo, Jiashi and others},
  journal={arXiv preprint arXiv:2511.10647},
  year={2025}
}

Model created 3 months, 1 week ago

Examples

Run time and cost