Input

Run this model in Node.js with one line of code:

npx create-replicate --model=idea-research/ram-grounded-sam

or set up a project from scratch

Install Replicate’s Node.js client library:

npm install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import and set up the client:

import Replicate from "replicate";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run idea-research/ram-grounded-sam using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "idea-research/ram-grounded-sam:80a2aede4cf8e3c9f26e96c308d45b23c350dd36f1c381de790715007f1ac0ad",
  {
    input: {
      use_sam_hq: false,
      input_image: "https://replicate.delivery/pbxt/J0ZYz9p5l5j1a8NR6GU1dprLTRS6O0g3QDyX9hTx0ignueHJ/demo1.jpg",
      show_visualisation: false
    }
  }
);

console.log(output);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import the client:

import replicate

Run idea-research/ram-grounded-sam using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "idea-research/ram-grounded-sam:80a2aede4cf8e3c9f26e96c308d45b23c350dd36f1c381de790715007f1ac0ad",
    input={
        "use_sam_hq": False,
        "input_image": "https://replicate.delivery/pbxt/J0ZYz9p5l5j1a8NR6GU1dprLTRS6O0g3QDyX9hTx0ignueHJ/demo1.jpg",
        "show_visualisation": False
    }
)

print(output)

To learn more, take a look at the guide on getting started with Python.

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Run idea-research/ram-grounded-sam using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "idea-research/ram-grounded-sam:80a2aede4cf8e3c9f26e96c308d45b23c350dd36f1c381de790715007f1ac0ad",
    "input": {
      "use_sam_hq": false,
      "input_image": "https://replicate.delivery/pbxt/J0ZYz9p5l5j1a8NR6GU1dprLTRS6O0g3QDyX9hTx0ignueHJ/demo1.jpg",
      "show_visualisation": false
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

Output

tags

armchair, blanket, lamp, carpet, couch, dog, floor, furniture, gray, green, living room, picture frame, pillow, plant, room, sit, stool, wood floor

json_data

{ "mask": [ { "label": "background", "value": 0 }, { "box": [ 1590.994384765625, 1150.455810546875, 2062.0126953125, 1525.410400390625 ], "label": "dog", "logit": 0.61, "value": 1 }, { "box": [ 320.5067138671875, 1872.4437255859375, 2564.30126953125, 2149.60693359375 ], "label": "carpet", "logit": 0.51, "value": 2 }, { "box": [ 2526.239013671875, 905.7824096679688, 2951.733154296875, 1446.57421875 ], "label": "lamp", "logit": 0.49, "value": 3 }, { "box": [ 1488.95751953125, 1134.001953125, 2101.83740234375, 1869.38525390625 ], "label": "blanket", "logit": 0.49, "value": 4 }, { "box": [ 1649.08935546875, 462.552734375, 2173.29248046875, 844.016357421875 ], "label": "picture frame", "logit": 0.46, "value": 5 }, { "box": [ 7.120361328125, 9.4737548828125, 2988.807861328125, 2143.8388671875 ], "label": "living room room", "logit": 0.45, "value": 6 }, { "box": [ 5.122314453125, 1560.66943359375, 2993.923828125, 2147.88232421875 ], "label": "floor wood floor", "logit": 0.44, "value": 7 }, { "box": [ 21.509536743164062, 1283.55908203125, 493.4183349609375, 1861.4951171875 ], "label": "plant", "logit": 0.43, "value": 8 }, { "box": [ 318.203369140625, 746.8875732421875, 874.658447265625, 1167.321044921875 ], "label": "plant", "logit": 0.4, "value": 9 }, { "box": [ 2223.62109375, 196.16336059570312, 2685.33837890625, 838.8326416015625 ], "label": "picture frame", "logit": 0.38, "value": 10 }, { "box": [ 729.8843383789062, 1130.2867431640625, 2381.857666015625, 1854.0660400390625 ], "label": "armchair couch", "logit": 0.38, "value": 11 }, { "box": [ 987.6152954101562, 1520.15966796875, 1631.835693359375, 2110.7431640625 ], "label": "furniture stool", "logit": 0.37, "value": 12 }, { "box": [ 1056.6025390625, 1164.431396484375, 1398.310791015625, 1448.27099609375 ], "label": "pillow", "logit": 0.35, "value": 13 }, { "box": [ 2108.94873046875, 1920.5863037109375, 2995.509033203125, 2149.590087890625 ], "label": "stool", "logit": 0.31, "value": 14 }, { "box": [ 1787.185302734375, 1134.058837890625, 2167.47900390625, 1448.926025390625 ], "label": "pillow", "logit": 0.31, "value": 15 }, { "box": [ 899.2510986328125, 1112.372802734375, 1249.1456298828125, 1453.3167724609375 ], "label": "pillow", "logit": 0.3, "value": 16 }, { "box": [ 471.819580078125, 1143.072998046875, 961.84228515625, 1664.814697265625 ], "label": "furniture", "logit": 0.27, "value": 17 }, { "box": [ 2467.264892578125, 1397.0821533203125, 2937.561279296875, 1785.8111572265625 ], "label": "furniture", "logit": 0.26, "value": 18 } ], "tags": "armchair, blanket, lamp, carpet, couch, dog, floor, furniture, gray, green, living room, picture frame, pillow, plant, room, sit, stool, wood floor" }

{
  "completed_at": "2023-06-17T08:55:52.530188Z",
  "created_at": "2023-06-17T08:55:51.347268Z",
  "data_removed": false,
  "error": null,
  "id": "lnpkzhdb6nbmlmxeudkbelmml4",
  "input": {
    "input_image": "https://replicate.delivery/pbxt/J0ZYz9p5l5j1a8NR6GU1dprLTRS6O0g3QDyX9hTx0ignueHJ/demo1.jpg",
    "show_visualisation": false
  },
  "logs": "Before NMS: 23 boxes\nAfter NMS: 18 boxes",
  "metrics": {
    "predict_time": 1.229093,
    "total_time": 1.18292
  },
  "output": {
    "tags": "armchair, blanket, lamp, carpet, couch, dog, floor, furniture, gray, green, living room, picture frame, pillow, plant, room, sit, stool, wood floor",
    "json_data": {
      "mask": [
        {
          "label": "background",
          "value": 0
        },
        {
          "box": [
            1590.994384765625,
            1150.455810546875,
            2062.0126953125,
            1525.410400390625
          ],
          "label": "dog",
          "logit": 0.61,
          "value": 1
        },
        {
          "box": [
            320.5067138671875,
            1872.4437255859375,
            2564.30126953125,
            2149.60693359375
          ],
          "label": "carpet",
          "logit": 0.51,
          "value": 2
        },
        {
          "box": [
            2526.239013671875,
            905.7824096679688,
            2951.733154296875,
            1446.57421875
          ],
          "label": "lamp",
          "logit": 0.49,
          "value": 3
        },
        {
          "box": [
            1488.95751953125,
            1134.001953125,
            2101.83740234375,
            1869.38525390625
          ],
          "label": "blanket",
          "logit": 0.49,
          "value": 4
        },
        {
          "box": [
            1649.08935546875,
            462.552734375,
            2173.29248046875,
            844.016357421875
          ],
          "label": "picture frame",
          "logit": 0.46,
          "value": 5
        },
        {
          "box": [
            7.120361328125,
            9.4737548828125,
            2988.807861328125,
            2143.8388671875
          ],
          "label": "living room room",
          "logit": 0.45,
          "value": 6
        },
        {
          "box": [
            5.122314453125,
            1560.66943359375,
            2993.923828125,
            2147.88232421875
          ],
          "label": "floor wood floor",
          "logit": 0.44,
          "value": 7
        },
        {
          "box": [
            21.509536743164062,
            1283.55908203125,
            493.4183349609375,
            1861.4951171875
          ],
          "label": "plant",
          "logit": 0.43,
          "value": 8
        },
        {
          "box": [
            318.203369140625,
            746.8875732421875,
            874.658447265625,
            1167.321044921875
          ],
          "label": "plant",
          "logit": 0.4,
          "value": 9
        },
        {
          "box": [
            2223.62109375,
            196.16336059570312,
            2685.33837890625,
            838.8326416015625
          ],
          "label": "picture frame",
          "logit": 0.38,
          "value": 10
        },
        {
          "box": [
            729.8843383789062,
            1130.2867431640625,
            2381.857666015625,
            1854.0660400390625
          ],
          "label": "armchair couch",
          "logit": 0.38,
          "value": 11
        },
        {
          "box": [
            987.6152954101562,
            1520.15966796875,
            1631.835693359375,
            2110.7431640625
          ],
          "label": "furniture stool",
          "logit": 0.37,
          "value": 12
        },
        {
          "box": [
            1056.6025390625,
            1164.431396484375,
            1398.310791015625,
            1448.27099609375
          ],
          "label": "pillow",
          "logit": 0.35,
          "value": 13
        },
        {
          "box": [
            2108.94873046875,
            1920.5863037109375,
            2995.509033203125,
            2149.590087890625
          ],
          "label": "stool",
          "logit": 0.31,
          "value": 14
        },
        {
          "box": [
            1787.185302734375,
            1134.058837890625,
            2167.47900390625,
            1448.926025390625
          ],
          "label": "pillow",
          "logit": 0.31,
          "value": 15
        },
        {
          "box": [
            899.2510986328125,
            1112.372802734375,
            1249.1456298828125,
            1453.3167724609375
          ],
          "label": "pillow",
          "logit": 0.3,
          "value": 16
        },
        {
          "box": [
            471.819580078125,
            1143.072998046875,
            961.84228515625,
            1664.814697265625
          ],
          "label": "furniture",
          "logit": 0.27,
          "value": 17
        },
        {
          "box": [
            2467.264892578125,
            1397.0821533203125,
            2937.561279296875,
            1785.8111572265625
          ],
          "label": "furniture",
          "logit": 0.26,
          "value": 18
        }
      ],
      "tags": "armchair, blanket, lamp, carpet, couch, dog, floor, furniture, gray, green, living room, picture frame, pillow, plant, room, sit, stool, wood floor"
    },
    "masked_img": null,
    "rounding_box_img": null
  },
  "started_at": "2023-06-17T08:55:51.301095Z",
  "status": "succeeded",
  "urls": {
    "get": "https://api.replicate.com/v1/predictions/lnpkzhdb6nbmlmxeudkbelmml4",
    "cancel": "https://api.replicate.com/v1/predictions/lnpkzhdb6nbmlmxeudkbelmml4/cancel"
  },
  "version": "47c4f1c73993ec1c8ab3c2b76fc6dfbab5dc5bab6c06786b196fa9e1bda8bcc4"
}

Generated in

1.2 seconds

Tweak it Report View full prediction

This output was created using a different version of the model, idea-research/ram-grounded-sam:47c4f1c7.

Examples

View more examples

Run time and cost

This model costs approximately $0.088 to run on Replicate, or 11 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia A100 (80GB) GPU hardware. Predictions typically complete within 63 seconds. The predict time for this model varies significantly based on the inputs.

Readme

Recognize Anything with Grounded-Segment-Anything

Recognize Anything Model (RAM) is an image tagging model, which can recognize any common category with high accuracy.

Highlight of RAM

RAM is a strong image tagging model, which can recognize any common category with high accuracy. - Strong and general. RAM exhibits exceptional image tagging capabilities with powerful zero-shot generalization; - RAM showcases impressive zero-shot performance, significantly outperforming CLIP and BLIP. - RAM even surpasses the fully supervised manners (ML-Decoder). - RAM exhibits competitive performance with the Google tagging API. - Reproducible and affordable. RAM requires Low reproduction cost with open-source and annotation-free dataset; - Flexible and versatile. RAM offers remarkable flexibility, catering to various application scenarios.

RAM significantly improves the tagging ability based on the Tag2text framework. - Accuracy. RAM utilizes a data engine to generate additional annotations and clean incorrect ones, higher accuracy compared to Tag2Text. - Scope. RAM upgrades the number of fixed tags from 3,400+ to 6,400+ (synonymous reduction to 4,500+ different semantic tags), covering more valuable categories.

Citation

If you find our work to be useful for your research, please consider citing.

@article{zhang2023recognize,
  title={Recognize Anything: A Strong Image Tagging Model},
  author={Zhang, Youcai and Huang, Xinyu and Ma, Jinyu and Li, Zhaoyang and Luo, Zhaochuan and Xie, Yanchun and Qin, Yuzhuo and Luo, Tong and Li, Yaqian and Liu, Shilong and others},
  journal={arXiv preprint arXiv:2306.03514},
  year={2023}
}

@article{liu2023grounding,
  title={Grounding dino: Marrying dino with grounded pre-training for open-set object detection},
  author={Liu, Shilong and Zeng, Zhaoyang and Ren, Tianhe and Li, Feng and Zhang, Hao and Yang, Jie and Li, Chunyuan and Yang, Jianwei and Su, Hang and Zhu, Jun and others},
  journal={arXiv preprint arXiv:2303.05499},
  year={2023}
}