VGGT: Visual Geometry Grounded Transformer
Visual Geometry Group, University of Oxford; Meta AI
Jianyuan Wang, Minghao Chen, Nikita Karaev, Andrea Vedaldi, Christian Rupprecht, David Novotny
@inproceedings{wang2025vggt,
title={VGGT: Visual Geometry Grounded Transformer},
author={Wang, Jianyuan and Chen, Minghao and Karaev, Nikita and Vedaldi, Andrea and Rupprecht, Christian and Novotny, David},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2025}
}
Note: This model uses VGGT-1B-Commercial weights.
Description
This model version omits the point map head that is redundant with the depth head.
Model output
Model returns an object with attributes:
-
point_cloud (optional): a URL to GLB file that contains point cloud and meshes that represent cameras.
-
depth_images (optional): a list of URLs to depth images.
-
data: a list of URLs to JSON files that contains raw model output per image, attributes:
image
,pose_enc
,depth
,depth_conf
.