SAMURAI Object Tracker

A simple-to-use model that tracks objects in videos using SAM 2’s technology. Just point to what you want to track in the first frame, and it will follow that object throughout the video.

How It Works

You provide: - A video file or folder of frames - The starting position (x, y coordinates) and size (width, height) of what you want to track

The model gives you: - A video showing what’s being tracked (with a red highlight) - Frame-by-frame tracking data in a standard format called COCO RLE (which is a space-efficient way to store mask information)

Output Format

The tracking data comes as a dictionary where:

frame_number: [{
    "size": [height, width],    # Size of the video frame
    "counts": "encoded_string", # Mask data in COCO RLE format
    "object_id": 0             # ID of the tracked object
}]

Credits

This model is powered by: - SAMURAI by Yang et al. from the University of Washington’s Information Processing Lab - SAM 2 (Segment Anything Model 2) by Meta FAIR - Original paper: “SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory”

License

Apache-2.0

Follow me on Twitter/X