VGGT: Visual Geometry Grounded Transformer

Visual Geometry Group, University of Oxford; Meta AI

Jianyuan Wang, Minghao Chen, Nikita Karaev, Andrea Vedaldi, Christian Rupprecht, David Novotny

@inproceedings{wang2025vggt,
  title={VGGT: Visual Geometry Grounded Transformer},
  author={Wang, Jianyuan and Chen, Minghao and Karaev, Nikita and Vedaldi, Andrea and Rupprecht, Christian and Novotny, David},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2025}
}

Note: This model uses VGGT-1B-Commercial weights.

Description

This model version omits the point map head that is redundant with the depth head.

Model output

Model returns an object with attributes:

point_cloud (optional): a URL to GLB file that contains point cloud and meshes that represent cameras.
depth_images (optional): a list of URLs to depth images.
data: a list of URLs to JSON files that contains raw model output per image, attributes: image, pose_enc, depth, depth_conf, original_image with width and heigth attributes and optional mask.

Model created 4 months, 3 weeks ago

Model updated 1 month, 2 weeks ago