Collections

Detect objects

These models distinguish objects in images and videos. You can use them to detect which things are in a scene, what they are and where they’re located. You can also cut objects out from the scene, or create masks for inpainting and other tasks.

Best for detecting objects in images: adirik/grounding-dino

To find specific things in an image, we recommend adirik/grounding-dino. You can input any number of text labels and get back bounding boxes for each of the objects you’re looking for. It’s cheap and takes less than a second to run.

Best for detecting objects in videos: zsxkib/yolo-world

Use this model to find and track things in videos from text labels. You’ll get back bounding boxes for each object by frame.

You can also use zsxkib/yolo-world for images. It’s similar in performance to the above, but sometimes one or the other will work better for a given use case.

Best for segmentation: meta/sam-2 and meta/sam-2-video

Meta’s Segment Anything Model is a great way to extract things from images and videos, or to create masks for inpainting. They require a little more preparation than the bounding box models: you’ll need to send the coordinates of click points for the objects you want to segment.

If you want to segment objects with text labels, try schananas/grounded_sam. Send a text prompt with object names and you’ll get back a mask for the collection of objects you’ve described.

Best for tracking objects in videos: zsxkib/samurai

Input a video and the coordinates for an object, and this specialized version of SAM will track the object across frames.

Best for labeling whole scenes: cjwbw/semantic-segment-anything

This model will label every pixel in an image with a class. It’s great for creating training data and creating masks for inpainting.

Recommended models

jweek / mask_​maker

Uses DINO to detect regions and further refines them with SAM. Returns masking data as RLE encoded JSON.

Updated 3 weeks, 2 days ago

261 runs

lucataco / florence-2-large

Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks

Updated 11 months, 1 week ago

137.7K runs

ahmdyassr / mask-clothing

Super fast clothing (and face) segmentation and masking with erosion and dilation capability, made for https://outfit.fm

Updated 11 months, 3 weeks ago

16.8K runs

hadilq / hair-segment

This is an ML model to segment hairs in pictures.

Updated 1 year ago

352 runs

swook / inspyrenet

Segment foreground objects with high resolution and matting, using InSPyReNet

Updated 1 year ago

691.8K runs

falcons-ai / nsfw_​image_​detection

Fine-Tuned Vision Transformer (ViT) for NSFW Image Classification

Updated 1 year, 6 months ago

61.1M runs

chigozienri / mediapipe-face

batch or individual face detection with mediapipe

Updated 1 year, 6 months ago

85.1K runs

adirik / owlvit-base-patch32

Zero-shot / open vocabulary object detection

Updated 1 year, 7 months ago

24.2K runs

hassamdevsy / mask2former

Facebook Mask2Former trained on ADE 20k Dataset

Updated 1 year, 10 months ago

56.6K runs

idea-research / ram-grounded-sam

A Strong Image Tagging Model with Segment Anything

Updated 1 year, 11 months ago

1.5M runs

naklecha / clothing-segmentation

This model can detect clothing using a custom state of the art clothing segmentation algorithm.

Updated 2 years ago

3.6K runs

daanelson / yolox

High performance and lightweight object detection models

Updated 2 years, 3 months ago

22.2K runs