lucataco / segment-anything-2

Segment Anything 2 (SAM2) by Meta - Automatic mask generation

  • Public
  • 19.4K runs
  • L40S
  • GitHub
  • Paper
  • License

Input

image
*file

Input image

integer

maximum number of masks to return. If -1 or None, all masks will be returned. NOTE: The masks are sorted by predicted_iou.

Default: -1

integer

The number of points to be sampled along one side of the image.

Default: 64

integer

Sets the number of points run simultaneously by the model

Default: 128

number

A filtering threshold in [0,1], using the model's predicted mask quality.

Default: 0.7

number

A filtering threshold in [0,1], using the stability of the mask under changes to the cutoff used to binarize the model's mask predictions.

Default: 0.92

number

The amount to shift the cutoff when calculated the stability score.

Default: 0.7

integer

If >0, mask prediction will be run again on crops of the image

Default: 1

number

The box IoU cutoff used by non-maximal suppression to filter duplicate masks.

Default: 0.7

integer

The number of points-per-side sampled in layer n is scaled down by crop_n_points_downscale_factor**n.

Default: 2

number

If >0, postprocessing will be applied to remove disconnected regions and holes in masks with area smaller than min_mask_region_area.

Default: 25

boolean

Whether to add a one step refinement using previous mask predictions.

Default: true

boolean

Whether to output multimask at each point of the grid.

Default: false

Output

outputoutput
Generated in

Run time and cost

This model costs approximately $0.032 to run on Replicate, or 31 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia L40S GPU hardware. Predictions typically complete within 33 seconds. The predict time for this model varies significantly based on the inputs.

Readme

Note: Currently, at this time this model only support image inputs (not video yet) and runs the large variant of the model.

SAM 2: Segment Anything in Images and Videos

About

Implementation of SAM 2, a model for segmenting objects in images and videos using various prompts.

Limitations

  • Performance may vary depending on image/video quality and complexity.
  • Very fast or complex motions in videos might be challenging.
  • Higher resolutions provide more detail but require more processing time.

SAM 2 is a 🔥 model developed by Meta AI Research. It excels at segmenting objects in both images and videos with various types of prompts.

Core Model

model architecture
An overview of the SAM 2 framework.

SAM 2 uses a transformer architecture with streaming memory for real-time video processing. It builds on the original SAM model, extending its capabilities to video.

For more technical details, check out the Research paper.

Safety

⚠️ Users should be aware of potential ethical implications: - Ensure you have the right to use input images and videos, especially those featuring identifiable individuals. - Be responsible about generated content to avoid potential misuse. - Be cautious about using copyrighted material as inputs without permission.

Support

All credit goes to the Meta AI Research team

Citation

@article{ravi2024sam2,
  title={SAM 2: Segment Anything in Images and Videos},
  author={Ravi, Nikhila and Gabeur, Valentin and Hu, Yuan-Ting and Hu, Ronghang and Ryali, Chaitanya and Ma, Tengyu and Khedr, Haitham and R{\"a}dle, Roman and Rolland, Chloe and Gustafson, Laura and Mintun, Eric and Pan, Junting and Alwala, Kalyan Vasudev and Carion, Nicolas and Wu, Chao-Yuan and Girshick, Ross and Doll{\'a}r, Piotr and Feichtenhofer, Christoph},
  journal={arXiv preprint},
  year={2024}
}