cjwbw / vq-diffusion

VQ-Diffusion for Text-to-Image Synthesis

Cold

Public
20.7K runs
A100 (80GB)
GitHub
Paper
License

Run with an API

Playground API Examples README Versions

Input

generation_type

string

Choose generating images from in-the-wild text, MSCOCO datasets, or ImageNet class label.

Default: "in-the-wild text"

prompt

string

Shift + Return to add a new line

Prompt for generating image. Valid when generation_type is set to in-the-wild text and MSCOCO datasets.

Default: ""

image_class

string

Choose the ImageNet label. Valid when generation_type is set to ImageNet class label.

Default: "None"

truncation_rate

number

(minimum: 0, maximum: 1)

Sample with truncation.

Default: 0.86

guidance_scale

number

Improved VQ-Diffusion with learnable classifier-free sampling.

Default: 1

Run this model in Node.js with one line of code:

npx create-replicate --model=cjwbw/vq-diffusion

or set up a project from scratch

Install Replicate’s Node.js client library:

npm install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import and set up the client:

import Replicate from "replicate";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run cjwbw/vq-diffusion using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "cjwbw/vq-diffusion:862db85880281370e440f48fb8eb413db0ba6dc0aa9e028a31e8f7919dfc196a",
  {
    input: {
      prompt: "teddy bear playing in the pool",
      image_class: "None",
      guidance_scale: 5,
      generation_type: "in-the-wild text",
      truncation_rate: 1
    }
  }
);

console.log(output);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import the client:

import replicate

Run cjwbw/vq-diffusion using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "cjwbw/vq-diffusion:862db85880281370e440f48fb8eb413db0ba6dc0aa9e028a31e8f7919dfc196a",
    input={
        "prompt": "teddy bear playing in the pool",
        "image_class": "None",
        "guidance_scale": 5,
        "generation_type": "in-the-wild text",
        "truncation_rate": 1
    }
)

print(output)

To learn more, take a look at the guide on getting started with Python.

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Run cjwbw/vq-diffusion using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "cjwbw/vq-diffusion:862db85880281370e440f48fb8eb413db0ba6dc0aa9e028a31e8f7919dfc196a",
    "input": {
      "prompt": "teddy bear playing in the pool",
      "image_class": "None",
      "guidance_scale": 5,
      "generation_type": "in-the-wild text",
      "truncation_rate": 1
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

Output

Generated in

52.2 seconds

Tweak it Report View full prediction

This output was created using a different version of the model, cjwbw/vq-diffusion:be7cfa9e.

Examples

View more examples

Run time and cost

This model costs approximately $0.78 to run on Replicate, or 1 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia A100 (80GB) GPU hardware. Predictions typically complete within 10 minutes. The predict time for this model varies significantly based on the inputs.

Readme

This is a cog implementation of https://github.com/microsoft/VQ-Diffusion

VQ-Diffusion (CVPR2022, Oral) and
Improved VQ-Diffusion

Overview

This is the official repo for the paper: Vector Quantized Diffusion Model for Text-to-Image Synthesis and Improved Vector Quantized Diffusion Models.

The code is the same as https://github.com/cientgu/VQ-Diffusion, some issues that have been raised can refer to it.

VQ-Diffusion is based on a VQ-VAE whose latent space is modeled by a conditional variant of the recently developed Denoising Diffusion Probabilistic Model (DDPM). It produces significantly better text-to-image generation results when compared with Autoregressive models with similar numbers of parameters. Compared with previous GAN-based methods, VQ-Diffusion can handle more complex scenes and improve the synthesized image quality by a large margin.

Framework

Cite VQ-Diffusion

if you find our code helpful for your research, please consider citing:

@article{gu2021vector,
  title={Vector Quantized Diffusion Model for Text-to-Image Synthesis},
  author={Gu, Shuyang and Chen, Dong and Bao, Jianmin and Wen, Fang and Zhang, Bo and Chen, Dongdong and Yuan, Lu and Guo, Baining},
  journal={arXiv preprint arXiv:2111.14822},
  year={2021}
}

Acknowledgement

Thanks to everyone who makes their code and models available. In particular,

License

This project is licensed under the license found in the LICENSE file in the root directory of this source tree.

Microsoft Open Source Code of Conduct

Contact Information

For help or issues using VQ-Diffusion, please submit a GitHub issue. For other communications related to VQ-Diffusion, please contact Shuyang Gu (gsy777@mail.ustc.edu.cn) or Dong Chen (doch@microsoft.com).