These models distinguish objects in images and videos. You can use them to detect which things are in a scene, what they are and where they're located. You can also cut objects out from the scene, or create masks for inpainting and other tasks.
To find specific things in an image, we recommend adirik/grounding-dino. You can input any number of text labels and get back bounding boxes for each of the objects you're looking for. It's cheap and takes less than a second to run.
Use this model to find and track things in videos from text labels. You'll get back bounding boxes for each object by frame.
You can also use zsxkib/yolo-world for images. It's similar in performance to the above, but sometimes one or the other will work better for a given use case.
Meta's Segment Anything Model is a great way to extract things from images and videos, or to create masks for inpainting. They require a little more preparation than the bounding box models: you'll need to send the coordinates of click points for the objects you want to segment.
If you want to segment objects with text labels, try schananas/grounded_sam. Send a text prompt with object names and you'll get back a mask for the collection of objects you've described.
Input a video and the coordinates for an object, and this specialized version of SAM will track the object across frames.
This model will label every pixel in an image with a class. It's great for creating training data and creating masks for inpainting.
Featured models
zsxkib/samurai
SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory
Updated 10 months, 3 weeks ago
226 runs
meta/sam-2-video
SAM 2: Segment Anything v2 (for videos)
Updated 1 year, 2 months ago
46.2K runs
meta/sam-2
SAM 2: Segment Anything v2 (for Images)
Updated 1 year, 2 months ago
25.7K runs
zsxkib/yolo-world
Real-Time Open-Vocabulary Object Detection
Updated 1 year, 8 months ago
12.3K runs
schananas/grounded_sam
Mask prompting based on Grounding DINO & Segment Anything | Integral cog of doiwear.it
Updated 1 year, 11 months ago
846.4K runs
adirik/grounding-dino
Detect everything with language!
Updated 1 year, 11 months ago
18.2M runs
cjwbw/semantic-segment-anything
Adding semantic labels for segment anything
Updated 2 years, 6 months ago
36.1K runs
Recommended Models
Recommended Models
jweek/mask_maker
Uses DINO to detect regions and further refines them with SAM. Returns masking data as RLE encoded JSON.
Updated 3 months, 3 weeks ago
499 runs
lucataco/florence-2-large
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
Updated 1 year, 3 months ago
467.6K runs
ahmdyassr/mask-clothing
Super fast clothing (and face) segmentation and masking with erosion and dilation capability, made for https://outfit.fm
Updated 1 year, 4 months ago
36.7K runs
hadilq/hair-segment
This is an ML model to segment hairs in pictures.
Updated 1 year, 5 months ago
506 runs
swook/inspyrenet
Segment foreground objects with high resolution and matting, using InSPyReNet
Updated 1 year, 5 months ago
695.1K runs
falcons-ai/nsfw_image_detection
Fine-Tuned Vision Transformer (ViT) for NSFW Image Classification
Updated 1 year, 11 months ago
64.9M runs
chigozienri/mediapipe-face
batch or individual face detection with mediapipe
Updated 1 year, 11 months ago
94.6K runs
adirik/owlvit-base-patch32
Zero-shot / open vocabulary object detection
Updated 2 years ago
24.4K runs
hassamdevsy/mask2former
Facebook Mask2Former trained on ADE 20k Dataset
Updated 2 years, 3 months ago
57.9K runs
idea-research/ram-grounded-sam
A Strong Image Tagging Model with Segment Anything
Updated 2 years, 4 months ago
1.5M runs
naklecha/clothing-segmentation
This model can detect clothing using a custom state of the art clothing segmentation algorithm.
Updated 2 years, 4 months ago
3.7K runs
daanelson/yolox
High performance and lightweight object detection models
Updated 2 years, 8 months ago
49.4K runs