visionaix/sam3-1

SAM 3.1 (Segment Anything 3.1): Unified promptable segmentation for images. Supports text, point, and box prompts. Detects and segments 270K+ concepts. By Meta FAIR.

No versions have been pushed to this model yet.

Readme

SAM 3.1 on Replicate

Segment Anything Model 3.1 — Unified Promptable Segmentation

Paper: arXiv 2511.16719 | Authors: Meta FAIR (Carion, Gustafson, Hu, et al.) | Code: github.com/facebookresearch/sam3 | Model: huggingface.co/facebook/sam3.1 | License: Meta SAM License

SAM 3.1 segments any object in an image using text prompts, point clicks, or bounding boxes. It detects 270K+ visual concepts — 50x more than prior benchmarks. SAM 3.1 adds Object Multiplex for ~7x faster multi-object tracking.

Works on both indoor and outdoor scenes.


Default Example

import replicate

output = replicate.run("visionaix/sam3-1", input={
    "image": "https://cdn.sanity.io/images/k55su7ch/production2/d9e35a73891d43ccb0bc665bf2e0d5d9d6f1ea2b-4200x2363.jpg?w=1920&q=75&auto=format",
    "text_prompt": "couch",
})
# output.masked_image - highlighted segmentation
# output.masks_overlay - overlay with scores and labels
# output.calibration_json - metadata

Prompt Types

Text Prompt (open-vocabulary, 270K+ concepts)

output = replicate.run("visionaix/sam3-1", input={
    "image": "photo.jpg",
    "text_prompt": "yellow school bus",
    "confidence_threshold": 0.5,
})

Point Prompt (click foreground/background)

output = replicate.run("visionaix/sam3-1", input={
    "image": "photo.jpg",
    "point_coords": "[[520, 375]]",
    "point_labels": "[1]",
})

Box Prompt (bounding box)

output = replicate.run("visionaix/sam3-1", input={
    "image": "photo.jpg",
    "box_prompt": "[100, 200, 400, 500]",
})

Inputs

Parameter Type Default Description
image File required Input image
text_prompt String "" Text describing what to segment
point_coords String "" JSON array of [x,y] pixel coords
point_labels String "" JSON array: 1=foreground, 0=background
box_prompt String "" JSON [x_min, y_min, x_max, y_max]
confidence_threshold Float 0.5 Min confidence for text detections
multimask_output Boolean false Return 3 candidate masks (point/box)
return_raw_masks Boolean false Return raw masks as .npy

Outputs

  • masked_image — Original image with segmented regions highlighted
  • masks_overlay — Overlay with contours, scores, and labels
  • calibration_json — Metadata (scores, boxes, timing)
  • raw_masks — Binary mask array as .npy (optional)

Citation

@misc{carion2025sam3,
  title={SAM 3: Segment Anything with Concepts},
  author={Carion, Nicolas and Gustafson, Laura and Hu, Yuan-Ting and others},
  year={2025},
  eprint={2511.16719},
  archivePrefix={arXiv},
}

License

Meta SAM License. See LICENSE.

Model created