Readme

Isaac 0.1

A vision-language model that understands and reasons about the physical world.

What is this?

Isaac 0.1 is Perceptron AI’s two billion parameter model built to answer questions about images with grounded, spatially-aware responses. Instead of just describing what it sees, Isaac can point to specific objects, read text in cluttered scenes, and adapt to new visual tasks from just a few examples.

The model matches or exceeds the performance of models over 50 times its size on perception tasks, making it practical to run in real-world applications where speed and efficiency matter.

What can it do?

Visual question answering

Ask questions about images and get detailed answers. The model handles complex visual reasoning about objects, their relationships, and the context around them.

Grounded spatial reasoning

Ask “what’s broken in this machine?” or “where’s the safety hazard?” and get answers that point to specific regions in the image. The model understands occlusions, spatial relationships, and how objects interact with each other.

Reading text in images

Extract text from signs, labels, documents, or any other text in your images. Isaac handles different resolutions and can read small text even in dense, cluttered layouts.

Learning from examples

Show the model a few annotated examples of what you’re looking for—like specific defects, safety conditions, or object types—and it adapts immediately. No fine-tuning needed.

Conversational pointing

The model provides responses where every claim is visually grounded and cited. This reduces hallucinations and makes it clear exactly what the model is seeing and reasoning about.

Use cases

Manufacturing and quality control

Detect defects in products, identify missing components, or spot anomalies on production lines. The model can learn your specific quality standards from a handful of examples.

Safety and security

Identify missing personal protective equipment, unsafe equipment use, or security concerns like unauthorized access or abandoned objects. The model highlights exactly where issues are located.

Robotics and automation

Help robots understand their environment with precise object localization and spatial reasoning. The model works well for navigation, manipulation, and interaction tasks.

Document processing

Extract information from forms, invoices, receipts, or any document with text. The model handles various layouts and can read text at different scales.

Repair and maintenance

Build applications that provide visual guidance for repairs, identify parts that need replacement, or help diagnose equipment problems.

Model details

Isaac 0.1 was built by the team at Perceptron AI, the same people behind Meta’s Chameleon multimodal models. The model uses a straightforward training approach and is designed to run efficiently at the edge or in latency-sensitive applications.

For technical details and benchmark results, see Perceptron’s announcement and the model on Hugging Face.

Try it out

You can experiment with Isaac 0.1 on the Replicate playground.