Fast, minimal port of DALL·E Mini to PyTorch

Public

505.6K runs

A100 (80GB)

Run with an API

Pricing

GitHub

License

Playground API Examples README Versions

Input

Run this model in Node.js with one line of code:

npx create-replicate --model=kuprel/min-dalle

or set up a project from scratch

Install Replicate’s Node.js client library:

npm install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import and set up the client:

import Replicate from "replicate";
import fs from "node:fs";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run kuprel/min-dalle using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "kuprel/min-dalle:2af375da21c5b824a84e1c459f45b69a117ec8649c2aa974112d7cf1840fc0ce",
  {
    input: {
      text: "Moai statue giving a TED talk",
      top_k: 128,
      seamless: false,
      grid_size: 5,
      save_as_png: false,
      temperature: 1,
      progressive_outputs: true,
      supercondition_factor: 16
    }
  }
);

// To access the file URL:
console.log(output[0].url()); //=> "http://example.com"

// To write the file to disk:
fs.writeFile("my-image.png", output[0]);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import the client:

import replicate

Run kuprel/min-dalle using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "kuprel/min-dalle:2af375da21c5b824a84e1c459f45b69a117ec8649c2aa974112d7cf1840fc0ce",
    input={
        "text": "Moai statue giving a TED talk",
        "top_k": 128,
        "seamless": False,
        "grid_size": 5,
        "save_as_png": False,
        "temperature": 1,
        "progressive_outputs": True,
        "supercondition_factor": 16
    }
)

# The kuprel/min-dalle model can stream output as it's running.
# The predict method returns an iterator, and you can iterate over that output.
for item in output:
    # https://replicate.com/kuprel/min-dalle/api#output-schema
    print(item)

To learn more, take a look at the guide on getting started with Python.

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Run kuprel/min-dalle using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "kuprel/min-dalle:2af375da21c5b824a84e1c459f45b69a117ec8649c2aa974112d7cf1840fc0ce",
    "input": {
      "text": "Moai statue giving a TED talk",
      "top_k": 128,
      "seamless": false,
      "grid_size": 5,
      "save_as_png": false,
      "temperature": 1,
      "progressive_outputs": true,
      "supercondition_factor": 16
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

Output

{
  "completed_at": "2022-07-25T00:37:21.840787Z",
  "created_at": "2022-07-25T00:37:05.490728Z",
  "data_removed": false,
  "error": null,
  "id": "drkg55flvjet3ooohc524kvfg4",
  "input": {
    "text": "Moai statue giving a TED talk",
    "top_k": "128",
    "grid_size": "5",
    "temperature": "1",
    "progressive_outputs": true,
    "supercondition_factor": "16"
  },
  "logs": "tokenizing text\n['Ġmo', 'ai']\n['Ġstatue']\n['Ġgiving']\n['Ġa']\n['Ġted']\n['Ġtalk']\n9 text tokens [0, 924, 336, 4039, 8658, 58, 5678, 2727, 2]\nencoding text tokens\ndetokenizing image\ndetokenizing image\ndetokenizing image\ndetokenizing image\ndetokenizing image\ndetokenizing image\ndetokenizing image\ndetokenizing image",
  "metrics": {
    "predict_time": 16.147947,
    "total_time": 16.350059
  },
  "output": [
    "https://replicate.delivery/mgxm/7d78edb7-c91a-4ccc-9734-125953840fac/moai-statue-giving-a-ted-talk-iter-1.jpg",
    "https://replicate.delivery/mgxm/8644d51a-c196-4896-89f0-9b8f04779e2f/moai-statue-giving-a-ted-talk-iter-2.jpg",
    "https://replicate.delivery/mgxm/9e1c9724-d8b5-429a-aaad-90288d1ac4ab/moai-statue-giving-a-ted-talk-iter-3.jpg",
    "https://replicate.delivery/mgxm/3ae242e9-5d5e-4784-bfc7-79c0051d902f/moai-statue-giving-a-ted-talk-iter-4.jpg",
    "https://replicate.delivery/mgxm/e48cd201-4260-40bb-8632-21da067246b8/moai-statue-giving-a-ted-talk-iter-5.jpg",
    "https://replicate.delivery/mgxm/0eb4d55e-e427-4db4-92e1-682f07563076/moai-statue-giving-a-ted-talk-iter-6.jpg",
    "https://replicate.delivery/mgxm/04bc6fb5-c1c4-4f3f-9ee9-f140a9f62412/moai-statue-giving-a-ted-talk-iter-7.jpg",
    "https://replicate.delivery/mgxm/b5ecd545-efca-46ae-8496-786ad16be61b/moai-statue-giving-a-ted-talk.jpg"
  ],
  "started_at": "2022-07-25T00:37:05.692840Z",
  "status": "succeeded",
  "urls": {
    "get": "https://api.replicate.com/v1/predictions/drkg55flvjet3ooohc524kvfg4",
    "cancel": "https://api.replicate.com/v1/predictions/drkg55flvjet3ooohc524kvfg4/cancel"
  },
  "version": "888c72d60932bca21344efcfdaecb3f0fbeb4bf40ee9ec601cc8fda806b5bfd9"
}

Generated in

16.1 seconds

Tweak it Iterate in playgroundReport View full prediction

This output was created using a different version of the model, kuprel/min-dalle:888c72d6.

Examples

View more examples

Run time and cost

This model costs approximately $0.0099 to run on Replicate, or 101 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia A100 (80GB) GPU hardware. Predictions typically complete within 8 seconds.

Readme

Input Parameter Descriptions

Basic

text: For long prompts, only the first 64 tokens will be used to generate the image.
save_as_png: If selected, the image is saved in lossless png format, otherwise jpg.
progressive_outputs: Show intermediate outputs while running. This adds less than a second to the run time.
seamless: Tile images in token space instead of pixel space. This has the effect of blending the images at the borders.
grid_size: Size of the image grid. 5x5 takes about 15 seconds, 9x9 takes about 40 seconds.

Advanced

temperature: High temperature increases the probability of sampling low scoring image tokens.
top_k: Each image token is sampled from the top-k scoring tokens.

Increasing temperature and/or top_k will increase variety in the generated images at the expense of the images being less coherent. Setting temperature high and top_k low can result in more variety without sacrificing coherence.

Expert

supercondition_factor: Higher values can result in better agreement with the text. Let logits_cond be the logits computed from the text prompt and logits_uncond be the logits computed from an empty text prompt, and let a be the super-condition factor, then logits = logits_cond * a + logits_uncond * (1 - a)

Example

Consider the images generated for “panda with top hat reading a book” with different settings.

text = "panda with top hat reading a book"
temperature = 0.5
top_k = 128
supercondition_factor = 4

min-dalle

text = "panda with top hat reading a book"
temperature = 4
top_k = 64
supercondition_factor = 16

min-dalle