Input

Run this model in Node.js with one line of code:

npx create-replicate --model=yoadtew/zero-shot-image-to-text

or set up a project from scratch

Install Replicate’s Node.js client library:

npm install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import and set up the client:

import Replicate from "replicate";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run yoadtew/zero-shot-image-to-text using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "yoadtew/zero-shot-image-to-text:7f2735bab48ff6caa414a3fff239b0d5de77a97f1791dcb7e0eb17c259aa11be",
  {
    input: {
      image: "https://replicate.delivery/mgxm/0958ab0c-8d26-45f8-a5f1-a27a1f2259cc/baby.jpg",
      beam_size: 5,
      cond_text: "Image of a",
      end_factor: 1.01,
      ce_loss_scale: 0.2,
      max_seq_length: 15
    }
  }
);

console.log(output);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import the client:

import replicate

Run yoadtew/zero-shot-image-to-text using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "yoadtew/zero-shot-image-to-text:7f2735bab48ff6caa414a3fff239b0d5de77a97f1791dcb7e0eb17c259aa11be",
    input={
        "image": "https://replicate.delivery/mgxm/0958ab0c-8d26-45f8-a5f1-a27a1f2259cc/baby.jpg",
        "beam_size": 5,
        "cond_text": "Image of a",
        "end_factor": 1.01,
        "ce_loss_scale": 0.2,
        "max_seq_length": 15
    }
)

print(output)

To learn more, take a look at the guide on getting started with Python.

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Run yoadtew/zero-shot-image-to-text using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "yoadtew/zero-shot-image-to-text:7f2735bab48ff6caa414a3fff239b0d5de77a97f1791dcb7e0eb17c259aa11be",
    "input": {
      "image": "https://replicate.delivery/mgxm/0958ab0c-8d26-45f8-a5f1-a27a1f2259cc/baby.jpg",
      "beam_size": 5,
      "cond_text": "Image of a",
      "end_factor": 1.01,
      "ce_loss_scale": 0.2,
      "max_seq_length": 15
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

You can run this model locally using Cog. First, install Cog:

brew install cog

If you don’t have Homebrew, there are other installation options available.

Run this to download the model and run it in your local environment:

cog predict r8.im/yoadtew/zero-shot-image-to-text@sha256:7f2735bab48ff6caa414a3fff239b0d5de77a97f1791dcb7e0eb17c259aa11be \
  -i 'image="https://replicate.delivery/mgxm/0958ab0c-8d26-45f8-a5f1-a27a1f2259cc/baby.jpg"' \
  -i 'beam_size=5' \
  -i 'cond_text="Image of a"' \
  -i 'end_factor=1.01' \
  -i 'ce_loss_scale=0.2' \
  -i 'max_seq_length=15'

To learn more, take a look at the Cog documentation.

Run this to download the model and run it in your local environment:

docker run -d -p 5000:5000 --gpus=all r8.im/yoadtew/zero-shot-image-to-text@sha256:7f2735bab48ff6caa414a3fff239b0d5de77a97f1791dcb7e0eb17c259aa11be
curl -s -X POST \
  -H "Content-Type: application/json" \
  -d $'{
    "input": {
      "image": "https://replicate.delivery/mgxm/0958ab0c-8d26-45f8-a5f1-a27a1f2259cc/baby.jpg",
      "beam_size": 5,
      "cond_text": "Image of a",
      "end_factor": 1.01,
      "ce_loss_scale": 0.2,
      "max_seq_length": 15
    }
  }' \
  http://localhost:5000/predictions

To learn more, take a look at the Cog documentation.

Output

Best CLIP: Image of a baby sleeping in a green flower. Best fluency: Image of a baby sleeping in a green flower. Best mixed: Image of a baby.

{
  "completed_at": "2021-12-08T13:21:53.131201Z",
  "created_at": "2021-12-08T13:17:37.003837Z",
  "data_removed": false,
  "error": null,
  "id": "mhjecpjikfbfhnpypayu32dhc4",
  "input": {
    "image": "https://replicate.delivery/mgxm/0958ab0c-8d26-45f8-a5f1-a27a1f2259cc/baby.jpg",
    "beam_size": "5",
    "cond_text": "Image of a",
    "end_factor": "1.01",
    "ce_loss_scale": "0.2",
    "max_seq_length": "15"
  },
  "logs": "08/12/2021 13:20:37 | [' baby %% -1.9783461', ' newborn %% -2.632012', ' child %% -3.1142285', ' Baby %% -3.6589427', ' infant %% -3.8890254']\n/root/.pyenv/versions/3.8.12/lib/python3.8/site-packages/torch/_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.\n08/12/2021 13:20:49 | [' baby in %% -3.415359', ' baby sleeping %% -3.425561', ' newborn photo %% -3.4403224', ' baby on %% -3.5201316', ' baby. %% -3.6261525']\nTo keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at  /pytorch/aten/src/ATen/native/BinaryOps.cpp:467.)\n  return torch.floor_divide(self, other)\n08/12/2021 13:21:01 | [' baby on leaves %% -3.4305604', ' baby in the %% -3.6233037', ' baby.! %% -3.6261525', ' baby on grass %% -3.6841075', ' baby sleeping in %% -3.69094']\n08/12/2021 13:21:13 | [' baby.!! %% -3.6261525', ' baby on leaves. %% -3.6822686', ' baby sleeping in a %% -3.8298764', ' baby sleeping in green %% -3.830517', ' baby on grass. %% -3.8318949']\n08/12/2021 13:21:26 | [' baby.!!! %% -3.6261525', ' baby on leaves.! %% -3.6822686', ' baby on grass.! %% -3.8318949', ' baby sleeping in a flower %% -3.8424401', ' baby sleeping in a green %% -3.9262307']\n08/12/2021 13:21:39 | [' baby.!!!! %% -3.6261525', ' baby on leaves.!! %% -3.6822686', ' baby on grass.!! %% -3.8318949', ' baby sleeping in a flower. %% -3.890606', ' baby sleeping in a green flower %% -4.0138764']\n08/12/2021 13:21:51 | [' baby.!!!!! %% -3.6261525', ' baby on leaves.!!! %% -3.6822686', ' baby on grass.!!! %% -3.8318949', ' baby sleeping in a flower.! %% -3.890606', ' baby sleeping in a green flower. %% -4.1707625']",
  "metrics": {
    "predict_time": 77.98017,
    "total_time": 256.127364
  },
  "output": [
    {
      "text": "Best CLIP: Image of a baby sleeping in a green flower. \nBest fluency: Image of a baby sleeping in a green flower. \nBest mixed: Image of a baby."
    }
  ],
  "started_at": "2021-12-08T13:20:35.151031Z",
  "status": "succeeded",
  "urls": {
    "get": "https://api.replicate.com/v1/predictions/mhjecpjikfbfhnpypayu32dhc4",
    "cancel": "https://api.replicate.com/v1/predictions/mhjecpjikfbfhnpypayu32dhc4/cancel"
  },
  "version": "6d1ac11ed211454979edf969f99a6f59b363ffe9a2cb9b9c24f8d29b720ba35a"
}

Generated in

78.0 seconds

Tweak it Report View full prediction

This output was created using a different version of the model, yoadtew/zero-shot-image-to-text:6d1ac11e.

Run time and cost

This model costs approximately $0.060 to run on Replicate, or 16 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia T4 GPU hardware. Predictions typically complete within 5 minutes. The predict time for this model varies significantly based on the inputs.

Readme

Pytorch Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic

Approach

Example

Citation

Please cite our work if you use it in your research:

@article{tewel2021zero,
  title={Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic},
  author={Tewel, Yoad and Shalev, Yoav and Schwartz, Idan and Wolf, Lior},
  journal={arXiv preprint arXiv:2111.14447},
  year={2021}
}

yoadtew / zero-shot-image-to-text

Input

Output

Run time and cost

Readme

Pytorch Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic

Approach

Example

Citation

Logs (mhjecpjikfbfhnpypayu32dhc4)