kfarr/tencent-hy-world-2.0

Tencent WorldMirror 2.0: feed-forward 3D reconstruction from multi-view images or video

Public

5 runs

License

GitHub

Paper

Run kfarr/tencent-hy-world-2.0 with an API

Use one of our client libraries to get started quickly. Clicking on a library will take you to the Playground tab where you can tweak different inputs, see the results, and copy the corresponding code to use in your own project.

Input schema

The fields you can use to run this model with an API. If you don't give a value for a field its default value will be used.

Field	Type	Default value	Description
input_file	string		A video file (mp4/mov/etc.) or a .zip archive of multi-view images. With a video, frames are extracted at the given fps.
target_size	integer	952 Min: 224 Max: 1568	Maximum resolution (longest edge). Images are resized and center-cropped to the nearest multiple of 14.
fps	integer	1 Min: 1 Max: 30	Frames-per-second to extract from a video input.
video_max_frames	integer	32 Min: 2 Max: 128	Maximum number of frames to use from a video input.
save_gaussians	boolean	True	Save 3D Gaussian splats (gaussians.ply).
save_points	boolean	True	Save dense point cloud (points.ply).
save_depth	boolean	True	Save per-view depth maps (PNG previews + .npy).
save_normal	boolean	True	Save per-view surface-normal maps.
save_camera	boolean	True	Save predicted camera parameters (camera_params.json).
apply_sky_mask	boolean	True	Mask out the sky region before reconstruction.
apply_edge_mask	boolean	True	Mask out unreliable depth/normal discontinuities.
compress_gs_max_points	integer	5000000 Min: 100000 Max: 20000000	Max number of gaussians to retain in the output PLY.
compress_pts_max_points	integer	2000000 Min: 100000 Max: 10000000	Max number of points to retain in points.ply.

{
  "type": "object",
  "title": "Input",
  "required": [
    "input_file"
  ],
  "properties": {
    "fps": {
      "type": "integer",
      "title": "Fps",
      "default": 1,
      "maximum": 30,
      "minimum": 1,
      "x-order": 2,
      "description": "Frames-per-second to extract from a video input."
    },
    "input_file": {
      "type": "string",
      "title": "Input File",
      "format": "uri",
      "x-order": 0,
      "description": "A video file (mp4/mov/etc.) or a .zip archive of multi-view images. With a video, frames are extracted at the given fps."
    },
    "save_depth": {
      "type": "boolean",
      "title": "Save Depth",
      "default": true,
      "x-order": 6,
      "description": "Save per-view depth maps (PNG previews + .npy)."
    },
    "save_camera": {
      "type": "boolean",
      "title": "Save Camera",
      "default": true,
      "x-order": 8,
      "description": "Save predicted camera parameters (camera_params.json)."
    },
    "save_normal": {
      "type": "boolean",
      "title": "Save Normal",
      "default": true,
      "x-order": 7,
      "description": "Save per-view surface-normal maps."
    },
    "save_points": {
      "type": "boolean",
      "title": "Save Points",
      "default": true,
      "x-order": 5,
      "description": "Save dense point cloud (points.ply)."
    },
    "target_size": {
      "type": "integer",
      "title": "Target Size",
      "default": 952,
      "maximum": 1568,
      "minimum": 224,
      "x-order": 1,
      "description": "Maximum resolution (longest edge). Images are resized and center-cropped to the nearest multiple of 14."
    },
    "apply_sky_mask": {
      "type": "boolean",
      "title": "Apply Sky Mask",
      "default": true,
      "x-order": 9,
      "description": "Mask out the sky region before reconstruction."
    },
    "save_gaussians": {
      "type": "boolean",
      "title": "Save Gaussians",
      "default": true,
      "x-order": 4,
      "description": "Save 3D Gaussian splats (gaussians.ply)."
    },
    "apply_edge_mask": {
      "type": "boolean",
      "title": "Apply Edge Mask",
      "default": true,
      "x-order": 10,
      "description": "Mask out unreliable depth/normal discontinuities."
    },
    "video_max_frames": {
      "type": "integer",
      "title": "Video Max Frames",
      "default": 32,
      "maximum": 128,
      "minimum": 2,
      "x-order": 3,
      "description": "Maximum number of frames to use from a video input."
    },
    "compress_gs_max_points": {
      "type": "integer",
      "title": "Compress Gs Max Points",
      "default": 5000000,
      "maximum": 20000000,
      "minimum": 100000,
      "x-order": 11,
      "description": "Max number of gaussians to retain in the output PLY."
    },
    "compress_pts_max_points": {
      "type": "integer",
      "title": "Compress Pts Max Points",
      "default": 2000000,
      "maximum": 10000000,
      "minimum": 100000,
      "x-order": 12,
      "description": "Max number of points to retain in points.ply."
    }
  }
}

Output schema

The shape of the response you’ll get when you run this model with an API.

Schema

{
  "type": "string",
  "title": "Output",
  "format": "uri"
}