nvidia / prismer

A Vision-Language Model with An Ensemble of Experts

Cold

Public
1.7K runs
T4
GitHub
Paper
License

Run with an API

Playground API Examples README Versions

Input

Run this model in Node.js with one line of code:

npx create-replicate --model=nvidia/prismer

or set up a project from scratch

Install Replicate’s Node.js client library:

npm install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import and set up the client:

import Replicate from "replicate";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run nvidia/prismer using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "nvidia/prismer:e604611dc43bfabc4eb5cda01eab65a491d74910cf5545da2a189718320873b1",
  {
    input: {
      task: "caption",
      question: "",
      model_size: "base",
      input_image: "https://replicate.delivery/pbxt/ISSa1VolSjpqROlBZm9FSrkC3PL0mJjwIQfeYNLYO8GowGuP/1.jpeg",
      use_experts: true,
      output_expert_labels: false
    }
  }
);

console.log(output);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import the client:

import replicate

Run nvidia/prismer using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "nvidia/prismer:e604611dc43bfabc4eb5cda01eab65a491d74910cf5545da2a189718320873b1",
    input={
        "task": "caption",
        "question": "",
        "model_size": "base",
        "input_image": "https://replicate.delivery/pbxt/ISSa1VolSjpqROlBZm9FSrkC3PL0mJjwIQfeYNLYO8GowGuP/1.jpeg",
        "use_experts": True,
        "output_expert_labels": False
    }
)
print(output)

To learn more, take a look at the guide on getting started with Python.

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Run nvidia/prismer using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "nvidia/prismer:e604611dc43bfabc4eb5cda01eab65a491d74910cf5545da2a189718320873b1",
    "input": {
      "task": "caption",
      "question": "",
      "model_size": "base",
      "input_image": "https://replicate.delivery/pbxt/ISSa1VolSjpqROlBZm9FSrkC3PL0mJjwIQfeYNLYO8GowGuP/1.jpeg",
      "use_experts": true,
      "output_expert_labels": false
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

Output

a man riding a skateboard across a cross walk

{
  "completed_at": "2023-03-13T20:33:53.752602Z",
  "created_at": "2023-03-13T20:32:10.928556Z",
  "data_removed": false,
  "error": null,
  "id": "clyqudkpy5fzric3mbmcnyyysa",
  "input": {
    "task": "caption",
    "question": "",
    "model_size": "base",
    "input_image": "https://replicate.delivery/pbxt/ISSa1VolSjpqROlBZm9FSrkC3PL0mJjwIQfeYNLYO8GowGuP/1.jpeg",
    "use_experts": true,
    "output_expert_labels": false
  },
  "logs": "***** Generating edge *****\n  0%|          | 0/1 [00:00<?, ?it/s]\n100%|██████████| 1/1 [00:02<00:00,  2.46s/it]\n100%|██████████| 1/1 [00:02<00:00,  2.46s/it]\n***** Generating depth *****\n  0%|          | 0/1 [00:00<?, ?it/s]\n100%|██████████| 1/1 [00:02<00:00,  2.32s/it]\n100%|██████████| 1/1 [00:02<00:00,  2.32s/it]\n***** Generating normal *****\nUsing cache found in /root/.cache/torch/hub/rwightman_gen-efficientnet-pytorch_master\nLoading base model ()...Done.\nRemoving last two layers (global_pool & classifier).\n  0%|          | 0/1 [00:00<?, ?it/s]\n100%|██████████| 1/1 [00:02<00:00,  2.61s/it]\n100%|██████████| 1/1 [00:02<00:00,  2.62s/it]\n***** Generating objdet *****\nLoading config experts/obj_detection/configs/Base-CRCNN-COCO.yaml with yaml.unsafe_load. Your machine may be at risk if the file contains malicious content.\n  0%|          | 0/1 [00:00<?, ?it/s]/root/.pyenv/versions/3.10.10/lib/python3.10/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.)\nreturn _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]\n100%|██████████| 1/1 [00:02<00:00,  2.73s/it]\n100%|██████████| 1/1 [00:02<00:00,  2.73s/it]\n***** Generating ocrdet *****\n  0%|          | 0/1 [00:00<?, ?it/s]\n100%|██████████| 1/1 [00:02<00:00,  2.80s/it]\n100%|██████████| 1/1 [00:02<00:00,  2.81s/it]\n***** Generating segmentation *****\n/root/.pyenv/versions/3.10.10/lib/python3.10/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.)\nreturn _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]\nWeight format of MultiScaleMaskedTransformerDecoder have changed! Please upgrade your models. Applying automatic conversion now ...\n  0%|          | 0/1 [00:00<?, ?it/s]\n100%|██████████| 1/1 [00:02<00:00,  2.58s/it]\n100%|██████████| 1/1 [00:02<00:00,  2.58s/it]",
  "metrics": {
    "predict_time": 102.749858,
    "total_time": 102.824046
  },
  "output": {
    "answer": "a man riding a skateboard across a cross walk"
  },
  "started_at": "2023-03-13T20:32:11.002744Z",
  "status": "succeeded",
  "urls": {
    "get": "https://api.replicate.com/v1/predictions/clyqudkpy5fzric3mbmcnyyysa",
    "cancel": "https://api.replicate.com/v1/predictions/clyqudkpy5fzric3mbmcnyyysa/cancel"
  },
  "version": "569a81a5da2233401dc05a37b2f0d17855eb953623c648956c823136e7f6c3ab"
}

Generated in

1 minute 43 seconds

Tweak it Report

***** Generating edge *****
  0%|          | 0/1 [00:00<?, ?it/s]
100%|██████████| 1/1 [00:02<00:00,  2.46s/it]
100%|██████████| 1/1 [00:02<00:00,  2.46s/it]
***** Generating depth *****
  0%|          | 0/1 [00:00<?, ?it/s]
100%|██████████| 1/1 [00:02<00:00,  2.32s/it]
100%|██████████| 1/1 [00:02<00:00,  2.32s/it]
***** Generating normal *****
Using cache found in /root/.cache/torch/hub/rwightman_gen-efficientnet-pytorch_master
Loading base model ()...Done.
Removing last two layers (global_pool & classifier).
  0%|          | 0/1 [00:00<?, ?it/s]
100%|██████████| 1/1 [00:02<00:00,  2.61s/it]
100%|██████████| 1/1 [00:02<00:00,  2.62s/it]
***** Generating objdet *****
Loading config experts/obj_detection/configs/Base-CRCNN-COCO.yaml with yaml.unsafe_load. Your machine may be at risk if the file contains malicious content.
  0%|          | 0/1 [00:00<?, ?it/s]/root/.pyenv/versions/3.10.10/lib/python3.10/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.)
return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
100%|██████████| 1/1 [00:02<00:00,  2.73s/it]
100%|██████████| 1/1 [00:02<00:00,  2.73s/it]
***** Generating ocrdet *****
  0%|          | 0/1 [00:00<?, ?it/s]
100%|██████████| 1/1 [00:02<00:00,  2.80s/it]
100%|██████████| 1/1 [00:02<00:00,  2.81s/it]
***** Generating segmentation *****
/root/.pyenv/versions/3.10.10/lib/python3.10/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.)
return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
Weight format of MultiScaleMaskedTransformerDecoder have changed! Please upgrade your models. Applying automatic conversion now ...
  0%|          | 0/1 [00:00<?, ?it/s]
100%|██████████| 1/1 [00:02<00:00,  2.58s/it]
100%|██████████| 1/1 [00:02<00:00,  2.58s/it]

This output was created using a different version of the model, nvidia/prismer:569a81a5.

Run time and cost

This model costs approximately $0.13 to run on Replicate, or 7 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia T4 GPU hardware. Predictions typically complete within 10 minutes. The predict time for this model varies significantly based on the inputs.

Readme

Prismer

This repository contains the source code of Prismer and PrismerZ from the paper, Prismer: A Vision-Language Model with An Ensemble of Experts.

Citation

If you found this code/work to be useful in your own research, please consider citing the following:

@article{liu2023prismer,
    title={Prismer: A Vision-Language Model with An Ensemble of Experts},
    author={Liu, Shikun and Fan, Linxi and Johns, Edward and Yu, Zhiding and Xiao, Chaowei and Anandkumar, Anima},
    journal={arXiv preprint arXiv:2303.02506},
    year={2023}
}

License

This work is made available under the Nvidia Source Code License-NC.

The model checkpoints are shared under CC-BY-NC-SA-4.0. If you remix, transform or build upon the material, you must distribute your contributions under the same license as the original.

For business inquiries, please visit our website and submit the form: NVIDIA Research Licensing.

Acknowledgement

We would like to thank all the researchers who open source their works to make this project possible. @bjoernpl for contributing an automated checkpoint download script.