adirik/grounding-dino | Run with an API on Replicate

Input

Run this model in Node.js with one line of code:

npx create-replicate --model=adirik/grounding-dino

or set up a project from scratch

Install Replicate’s Node.js client library:

npm install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import and set up the client:

import Replicate from "replicate";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run adirik/grounding-dino using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "adirik/grounding-dino:efd10a8ddc57ea28773327e881ce95e20cc1d734c589f7dd01d2036921ed78aa",
  {
    input: {
      image: "https://replicate.delivery/pbxt/JlgUQIQCDemKg7bnfn5zKMqLgAPrZdpMMHzkXgHX5HUlbw9z/mugs.webp",
      query: "pink mug",
      box_threshold: 0.2,
      text_threshold: 0.2,
      show_visualisation: true
    }
  }
);

console.log(output);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import the client:

import replicate

Run adirik/grounding-dino using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "adirik/grounding-dino:efd10a8ddc57ea28773327e881ce95e20cc1d734c589f7dd01d2036921ed78aa",
    input={
        "image": "https://replicate.delivery/pbxt/JlgUQIQCDemKg7bnfn5zKMqLgAPrZdpMMHzkXgHX5HUlbw9z/mugs.webp",
        "query": "pink mug",
        "box_threshold": 0.2,
        "text_threshold": 0.2,
        "show_visualisation": True
    }
)
print(output)

To learn more, take a look at the guide on getting started with Python.

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Run adirik/grounding-dino using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "adirik/grounding-dino:efd10a8ddc57ea28773327e881ce95e20cc1d734c589f7dd01d2036921ed78aa",
    "input": {
      "image": "https://replicate.delivery/pbxt/JlgUQIQCDemKg7bnfn5zKMqLgAPrZdpMMHzkXgHX5HUlbw9z/mugs.webp",
      "query": "pink mug",
      "box_threshold": 0.2,
      "text_threshold": 0.2,
      "show_visualisation": true
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

Output

detections

{ "bbox": [ 19, 204, 408, 563 ], "label": "pink mug", "confidence": 0.8077122569084167 }

{ "bbox": [ 545, 263, 952, 650 ], "label": "pink mug", "confidence": 0.7644544839859009 }

{ "bbox": [ 416, 60, 764, 380 ], "label": "pink mug", "confidence": 0.4754282832145691 }

{ "bbox": [ 909, 161, 1078, 487 ], "label": "pink mug", "confidence": 0.43150201439857483 }

result_image

Generated in

1.7 seconds

Tweak itReport View full prediction

Examples

View more examples

Run time and cost

This model costs approximately $0.00098 to run on Replicate, or 1020 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia L40S GPU hardware. Predictions typically complete within 1 seconds.

Readme

Grounding DINO

Grounding DINO can detect arbitrary objects with human text inputs such as category names or referring expressions. The model architecture combines Transformer-based detector DINO with grounded pre-training to achieve open-vocabulary / text-guided object detection. See the paper and original repository for details.

Using the API

You can use Grounding DINO to query images with text descriptions of any object. To use it, simply upload an image and enter comma separated text descriptions of objects you want to query the image for. Expected input arguments are:

image: your input image
query: text queries describing objects you want to detect, separate queries with commas
box_threshold: chooses the boxes whose highest similarities are higher than a box_threshold
text_threshold: extracts the words whose similarities are higher than the text_threshold as predicted labels

References

@article{liu2023grounding,
  title={Grounding dino: Marrying dino with grounded pre-training for open-set object detection},
  author={Liu, Shilong and Zeng, Zhaoyang and Ren, Tianhe and Li, Feng and Zhang, Hao and Yang, Jie and Li, Chunyuan and Yang, Jianwei and Su, Hang and Zhu, Jun and others},
  journal={arXiv preprint arXiv:2303.05499},
  year={2023}
}