chenxwh / cogview3

Finer and Faster Text-to-Image Generation via Relay Diffusion

Cold

Public
48 runs
L40S
GitHub
Weights
Paper
License

Iterate in playground

Run with an API

Playground API Examples README Versions

Input

prompt

string

Shift + Return to add a new line

a photo of an astronaut riding a horse on marsa photo of an astronaut riding a horse on mars

Input prompt

Default: "a photo of an astronaut riding a horse on mars"

negative_prompt

string

Shift + Return to add a new line

Specify things to not see in the output

Default: ""

width

integer

Width of output image. Maximum size is 1024x768 or 768x1024 because of memory limits

Default: 1024

height

integer

Height of output image. Maximum size is 1024x768 or 768x1024 because of memory limits

Default: 1024

num_inference_steps

integer

(minimum: 1, maximum: 500)

Number of denoising steps

Default: 50

guidance_scale

number

(minimum: 1, maximum: 20)

Scale for classifier-free guidance

Default: 7

seed

integer

Random seed. Leave blank to randomize the seed

Run this model in Node.js with one line of code:

npx create-replicate --model=chenxwh/cogview3

or set up a project from scratch

Install Replicate’s Node.js client library:

npm install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import and set up the client:

import Replicate from "replicate";
import fs from "node:fs";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run chenxwh/cogview3 using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "chenxwh/cogview3:9bb5091b11777296d31136d6ba887ad2129044eb211b37b76479064f9bd78f9e",
  {
    input: {
      width: 1024,
      height: 1024,
      prompt: "a photo of an astronaut riding a horse on mars",
      guidance_scale: 7,
      negative_prompt: "",
      num_inference_steps: 50
    }
  }
);

// To access the file URL:
console.log(output.url()); //=> "http://example.com"

// To write the file to disk:
fs.writeFile("my-image.png", output);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import the client:

import replicate

Run chenxwh/cogview3 using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "chenxwh/cogview3:9bb5091b11777296d31136d6ba887ad2129044eb211b37b76479064f9bd78f9e",
    input={
        "width": 1024,
        "height": 1024,
        "prompt": "a photo of an astronaut riding a horse on mars",
        "guidance_scale": 7,
        "negative_prompt": "",
        "num_inference_steps": 50
    }
)

# To access the file URL:
print(output.url())
#=> "http://example.com"

# To write the file to disk:
with open("my-image.png", "wb") as file:
    file.write(output.read())

To learn more, take a look at the guide on getting started with Python.

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Run chenxwh/cogview3 using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "chenxwh/cogview3:9bb5091b11777296d31136d6ba887ad2129044eb211b37b76479064f9bd78f9e",
    "input": {
      "width": 1024,
      "height": 1024,
      "prompt": "a photo of an astronaut riding a horse on mars",
      "guidance_scale": 7,
      "negative_prompt": "",
      "num_inference_steps": 50
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

You can run this model locally using Cog. First, install Cog:

brew install cog

If you don’t have Homebrew, there are other installation options available.

Run this to download the model and run it in your local environment:

cog predict r8.im/chenxwh/cogview3@sha256:9bb5091b11777296d31136d6ba887ad2129044eb211b37b76479064f9bd78f9e \
  -i 'width=1024' \
  -i 'height=1024' \
  -i 'prompt="a photo of an astronaut riding a horse on mars"' \
  -i 'guidance_scale=7' \
  -i 'negative_prompt=""' \
  -i 'num_inference_steps=50'

To learn more, take a look at the Cog documentation.

Run this to download the model and run it in your local environment:

docker run -d -p 5000:5000 --gpus=all r8.im/chenxwh/cogview3@sha256:9bb5091b11777296d31136d6ba887ad2129044eb211b37b76479064f9bd78f9e
curl -s -X POST \
  -H "Content-Type: application/json" \
  -d $'{
    "input": {
      "width": 1024,
      "height": 1024,
      "prompt": "a photo of an astronaut riding a horse on mars",
      "guidance_scale": 7,
      "negative_prompt": "",
      "num_inference_steps": 50
    }
  }' \
  http://localhost:5000/predictions

To learn more, take a look at the Cog documentation.

Output

{
  "completed_at": "2024-10-14T22:24:31.420390Z",
  "created_at": "2024-10-14T22:20:04.159000Z",
  "data_removed": false,
  "error": null,
  "id": "fymhagqpqxrgp0cjhmerev4pwc",
  "input": {
    "width": 1024,
    "height": 1024,
    "prompt": "a photo of an astronaut riding a horse on mars",
    "guidance_scale": 7,
    "negative_prompt": "",
    "num_inference_steps": 50
  },
  "logs": "Using seed: 26279\n  0%|          | 0/50 [00:00<?, ?it/s]\n  2%|▏         | 1/50 [00:04<04:04,  4.99s/it]\n  4%|▍         | 2/50 [00:05<01:59,  2.48s/it]\n  6%|▌         | 3/50 [00:06<01:18,  1.68s/it]\n  8%|▊         | 4/50 [00:07<00:59,  1.30s/it]\n 10%|█         | 5/50 [00:07<00:49,  1.09s/it]\n 12%|█▏        | 6/50 [00:08<00:42,  1.03it/s]\n 14%|█▍        | 7/50 [00:09<00:38,  1.12it/s]\n 16%|█▌        | 8/50 [00:10<00:35,  1.20it/s]\n 18%|█▊        | 9/50 [00:10<00:32,  1.25it/s]\n 20%|██        | 10/50 [00:11<00:31,  1.29it/s]\n 22%|██▏       | 11/50 [00:12<00:29,  1.32it/s]\n 24%|██▍       | 12/50 [00:12<00:28,  1.34it/s]\n 26%|██▌       | 13/50 [00:13<00:27,  1.35it/s]\n 28%|██▊       | 14/50 [00:14<00:26,  1.36it/s]\n 30%|███       | 15/50 [00:15<00:25,  1.37it/s]\n 32%|███▏      | 16/50 [00:15<00:24,  1.37it/s]\n 34%|███▍      | 17/50 [00:16<00:23,  1.38it/s]\n 36%|███▌      | 18/50 [00:17<00:23,  1.38it/s]\n 38%|███▊      | 19/50 [00:18<00:22,  1.38it/s]\n 40%|████      | 20/50 [00:18<00:21,  1.38it/s]\n 42%|████▏     | 21/50 [00:19<00:21,  1.38it/s]\n 44%|████▍     | 22/50 [00:20<00:20,  1.38it/s]\n 46%|████▌     | 23/50 [00:20<00:19,  1.38it/s]\n 48%|████▊     | 24/50 [00:21<00:18,  1.38it/s]\n 50%|█████     | 25/50 [00:22<00:18,  1.38it/s]\n 52%|█████▏    | 26/50 [00:23<00:17,  1.38it/s]\n 54%|█████▍    | 27/50 [00:23<00:16,  1.38it/s]\n 56%|█████▌    | 28/50 [00:24<00:15,  1.38it/s]\n 58%|█████▊    | 29/50 [00:25<00:15,  1.38it/s]\n 60%|██████    | 30/50 [00:25<00:14,  1.37it/s]\n 62%|██████▏   | 31/50 [00:26<00:13,  1.37it/s]\n 64%|██████▍   | 32/50 [00:27<00:13,  1.37it/s]\n 66%|██████▌   | 33/50 [00:28<00:12,  1.37it/s]\n 68%|██████▊   | 34/50 [00:28<00:11,  1.37it/s]\n 70%|███████   | 35/50 [00:29<00:10,  1.37it/s]\n 72%|███████▏  | 36/50 [00:30<00:10,  1.37it/s]\n 74%|███████▍  | 37/50 [00:31<00:09,  1.37it/s]\n 76%|███████▌  | 38/50 [00:31<00:08,  1.37it/s]\n 78%|███████▊  | 39/50 [00:32<00:08,  1.37it/s]\n 80%|████████  | 40/50 [00:33<00:07,  1.37it/s]\n 82%|████████▏ | 41/50 [00:34<00:06,  1.37it/s]\n 84%|████████▍ | 42/50 [00:34<00:05,  1.37it/s]\n 86%|████████▌ | 43/50 [00:35<00:05,  1.37it/s]\n 88%|████████▊ | 44/50 [00:36<00:04,  1.37it/s]\n 90%|█████████ | 45/50 [00:36<00:03,  1.37it/s]\n 92%|█████████▏| 46/50 [00:37<00:02,  1.37it/s]\n 94%|█████████▍| 47/50 [00:38<00:02,  1.37it/s]\n 96%|█████████▌| 48/50 [00:39<00:01,  1.37it/s]\n 98%|█████████▊| 49/50 [00:39<00:00,  1.37it/s]\n100%|██████████| 50/50 [00:40<00:00,  1.37it/s]\n100%|██████████| 50/50 [00:40<00:00,  1.23it/s]",
  "metrics": {
    "predict_time": 47.391973947,
    "total_time": 267.26139
  },
  "output": "https://replicate.delivery/pbxt/XapoTdLQD2oHLlD2ILTsZCgrTbkRH8LI4rnjCovJqUrHMt5E/out.png",
  "started_at": "2024-10-14T22:23:44.028416Z",
  "status": "succeeded",
  "urls": {
    "get": "https://api.replicate.com/v1/predictions/fymhagqpqxrgp0cjhmerev4pwc",
    "cancel": "https://api.replicate.com/v1/predictions/fymhagqpqxrgp0cjhmerev4pwc/cancel"
  },
  "version": "9bb5091b11777296d31136d6ba887ad2129044eb211b37b76479064f9bd78f9e"
}

Generated in

47.4 seconds

Tweak it ShareReport View full prediction

Using seed: 26279
  0%|          | 0/50 [00:00<?, ?it/s]
  2%|▏         | 1/50 [00:04<04:04,  4.99s/it]
  4%|▍         | 2/50 [00:05<01:59,  2.48s/it]
  6%|▌         | 3/50 [00:06<01:18,  1.68s/it]
  8%|▊         | 4/50 [00:07<00:59,  1.30s/it]
 10%|█         | 5/50 [00:07<00:49,  1.09s/it]
 12%|█▏        | 6/50 [00:08<00:42,  1.03it/s]
 14%|█▍        | 7/50 [00:09<00:38,  1.12it/s]
 16%|█▌        | 8/50 [00:10<00:35,  1.20it/s]
 18%|█▊        | 9/50 [00:10<00:32,  1.25it/s]
 20%|██        | 10/50 [00:11<00:31,  1.29it/s]
 22%|██▏       | 11/50 [00:12<00:29,  1.32it/s]
 24%|██▍       | 12/50 [00:12<00:28,  1.34it/s]
 26%|██▌       | 13/50 [00:13<00:27,  1.35it/s]
 28%|██▊       | 14/50 [00:14<00:26,  1.36it/s]
 30%|███       | 15/50 [00:15<00:25,  1.37it/s]
 32%|███▏      | 16/50 [00:15<00:24,  1.37it/s]
 34%|███▍      | 17/50 [00:16<00:23,  1.38it/s]
 36%|███▌      | 18/50 [00:17<00:23,  1.38it/s]
 38%|███▊      | 19/50 [00:18<00:22,  1.38it/s]
 40%|████      | 20/50 [00:18<00:21,  1.38it/s]
 42%|████▏     | 21/50 [00:19<00:21,  1.38it/s]
 44%|████▍     | 22/50 [00:20<00:20,  1.38it/s]
 46%|████▌     | 23/50 [00:20<00:19,  1.38it/s]
 48%|████▊     | 24/50 [00:21<00:18,  1.38it/s]
 50%|█████     | 25/50 [00:22<00:18,  1.38it/s]
 52%|█████▏    | 26/50 [00:23<00:17,  1.38it/s]
 54%|█████▍    | 27/50 [00:23<00:16,  1.38it/s]
 56%|█████▌    | 28/50 [00:24<00:15,  1.38it/s]
 58%|█████▊    | 29/50 [00:25<00:15,  1.38it/s]
 60%|██████    | 30/50 [00:25<00:14,  1.37it/s]
 62%|██████▏   | 31/50 [00:26<00:13,  1.37it/s]
 64%|██████▍   | 32/50 [00:27<00:13,  1.37it/s]
 66%|██████▌   | 33/50 [00:28<00:12,  1.37it/s]
 68%|██████▊   | 34/50 [00:28<00:11,  1.37it/s]
 70%|███████   | 35/50 [00:29<00:10,  1.37it/s]
 72%|███████▏  | 36/50 [00:30<00:10,  1.37it/s]
 74%|███████▍  | 37/50 [00:31<00:09,  1.37it/s]
 76%|███████▌  | 38/50 [00:31<00:08,  1.37it/s]
 78%|███████▊  | 39/50 [00:32<00:08,  1.37it/s]
 80%|████████  | 40/50 [00:33<00:07,  1.37it/s]
 82%|████████▏ | 41/50 [00:34<00:06,  1.37it/s]
 84%|████████▍ | 42/50 [00:34<00:05,  1.37it/s]
 86%|████████▌ | 43/50 [00:35<00:05,  1.37it/s]
 88%|████████▊ | 44/50 [00:36<00:04,  1.37it/s]
 90%|█████████ | 45/50 [00:36<00:03,  1.37it/s]
 92%|█████████▏| 46/50 [00:37<00:02,  1.37it/s]
 94%|█████████▍| 47/50 [00:38<00:02,  1.37it/s]
 96%|█████████▌| 48/50 [00:39<00:01,  1.37it/s]
 98%|█████████▊| 49/50 [00:39<00:00,  1.37it/s]
100%|██████████| 50/50 [00:40<00:00,  1.37it/s]
100%|██████████| 50/50 [00:40<00:00,  1.23it/s]

Examples

View more examples

Run time and cost

This model runs on Nvidia L40S GPU hardware. We don't yet have enough runs of this model to provide performance information.

Readme

CogView3 & CogView-3Plus

Model Introduction

CogView-3-Plus builds upon CogView3 (ECCV‘24) by introducing the latest DiT framework for further overall performance improvements. CogView-3-Plus uses the Zero-SNR diffusion noise scheduling and incorporates a joint text-image attention mechanism. Compared to the commonly used MMDiT structure, it effectively reduces training and inference costs while maintaining the model’s basic capabilities. CogView-3Plus utilizes a VAE with a latent dimension of 16.

Citation

🌟 If you find our work helpful, feel free to cite our paper and leave a star.

@article{zheng2024cogview3,
  title={Cogview3: Finer and faster text-to-image generation via relay diffusion},
  author={Zheng, Wendi and Teng, Jiayan and Yang, Zhuoyi and Wang, Weihan and Chen, Jidong and Gu, Xiaotao and Dong, Yuxiao and Ding, Ming and Tang, Jie},
  journal={arXiv preprint arXiv:2403.05121},
  year={2024}
}