Readme

VGGT: Visual Geometry Grounded Transformer

Visual Geometry Group, University of Oxford; Meta AI

Jianyuan Wang, Minghao Chen, Nikita Karaev, Andrea Vedaldi, Christian Rupprecht, David Novotny

@inproceedings{wang2025vggt,
  title={VGGT: Visual Geometry Grounded Transformer},
  author={Wang, Jianyuan and Chen, Minghao and Karaev, Nikita and Vedaldi, Andrea and Rupprecht, Christian and Novotny, David},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2025}
}

Note: This model uses VGGT-1B-Commercial weights.

Model output

Model returns an object with attributes:

point_cloud (optional): a URL to GLB file that contains point cloud and meshes that represent cameras.
data: a list of URLs to JSON files that contains raw model output per image, attributes: image, pose_enc, depth, depth_conf, world_points, world_points_conf, original_image with width and heigth attributes and optional mask.

Model created 9 months, 1 week ago

Model updated 4 months ago

Run time and cost

Readme

VGGT: Visual Geometry Grounded Transformer

Model output