lucataco / cogview4-6b

CogView-4 model, which has 6B parameters, supports native Chinese input, and Chinese text-to-image generation.

Cold

Public
107 runs
L40S
Weights
Paper
License

Iterate in playground

Run with an API

Playground API Examples README Versions

Input

prompt

*string

Shift + Return to add a new line

A vibrant cherry red sports car sits proudly under the gleaming sun, its polished exterior smooth and flawless, casting a mirror-like reflection. The car features a low, aerodynamic body, angular headlights that gaze forward like predatory eyes, and a set of black, high-gloss racing rims that contrast starkly with the red. A subtle hint of chrome embellishes the grille and exhaust, while the tinted windows suggest a luxurious and private interior. The scene conveys a sense of speed and elegance, the car appearing as if it's about to burst into a sprint along a coastal road, with the ocean's azure waves crashing in the backgroundA vibrant cherry red sports car sits proudly under the gleaming sun, its polished exterior smooth and flawless, casting a mirror-like reflection. The car features a low, aerodynamic body, angular headlights that gaze forward like predatory eyes, and a set of black, high-gloss racing rims that contrast starkly with the red. A subtle hint of chrome embellishes the grille and exhaust, while the tinted windows suggest a luxurious and private interior. The scene conveys a sense of speed and elegance, the car appearing as if it's about to burst into a sprint along a coastal road, with the ocean's azure waves crashing in the background

Text prompt to generate an image from

negative_prompt

string

Shift + Return to add a new line

Negative prompt to guide image generation away from certain concepts

width

integer

(minimum: 512, maximum: 2048)

Width of the generated image (must be between 512 and 2048, divisible by 32)

Default: 1024

height

integer

(minimum: 512, maximum: 2048)

Height of the generated image (must be between 512 and 2048, divisible by 32)

Default: 1024

num_inference_steps

integer

(minimum: 1, maximum: 100)

Number of denoising steps

Default: 50

guidance_scale

number

(minimum: 0, maximum: 20)

Guidance scale for classifier-free guidance

Default: 3.5

seed

integer

Random seed for reproducible image generation

Run this model in Node.js with one line of code:

npx create-replicate --model=lucataco/cogview4-6b

or set up a project from scratch

Install Replicate’s Node.js client library:

npm install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import and set up the client:

import Replicate from "replicate";
import fs from "node:fs";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run lucataco/cogview4-6b using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "lucataco/cogview4-6b:5b608630958be05f6845ed0d629f56cd01c372017d42f4e26916a12c7eab7b62",
  {
    input: {
      width: 1024,
      height: 1024,
      prompt: "A vibrant cherry red sports car sits proudly under the gleaming sun, its polished exterior smooth and flawless, casting a mirror-like reflection. The car features a low, aerodynamic body, angular headlights that gaze forward like predatory eyes, and a set of black, high-gloss racing rims that contrast starkly with the red. A subtle hint of chrome embellishes the grille and exhaust, while the tinted windows suggest a luxurious and private interior. The scene conveys a sense of speed and elegance, the car appearing as if it's about to burst into a sprint along a coastal road, with the ocean's azure waves crashing in the background",
      guidance_scale: 3.5,
      num_inference_steps: 50
    }
  }
);

// To access the file URL:
console.log(output.url()); //=> "http://example.com"

// To write the file to disk:
fs.writeFile("my-image.png", output);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import the client:

import replicate

Run lucataco/cogview4-6b using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "lucataco/cogview4-6b:5b608630958be05f6845ed0d629f56cd01c372017d42f4e26916a12c7eab7b62",
    input={
        "width": 1024,
        "height": 1024,
        "prompt": "A vibrant cherry red sports car sits proudly under the gleaming sun, its polished exterior smooth and flawless, casting a mirror-like reflection. The car features a low, aerodynamic body, angular headlights that gaze forward like predatory eyes, and a set of black, high-gloss racing rims that contrast starkly with the red. A subtle hint of chrome embellishes the grille and exhaust, while the tinted windows suggest a luxurious and private interior. The scene conveys a sense of speed and elegance, the car appearing as if it's about to burst into a sprint along a coastal road, with the ocean's azure waves crashing in the background",
        "guidance_scale": 3.5,
        "num_inference_steps": 50
    }
)

# To access the file URL:
print(output.url())
#=> "http://example.com"

# To write the file to disk:
with open("my-image.png", "wb") as file:
    file.write(output.read())

To learn more, take a look at the guide on getting started with Python.

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Run lucataco/cogview4-6b using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "lucataco/cogview4-6b:5b608630958be05f6845ed0d629f56cd01c372017d42f4e26916a12c7eab7b62",
    "input": {
      "width": 1024,
      "height": 1024,
      "prompt": "A vibrant cherry red sports car sits proudly under the gleaming sun, its polished exterior smooth and flawless, casting a mirror-like reflection. The car features a low, aerodynamic body, angular headlights that gaze forward like predatory eyes, and a set of black, high-gloss racing rims that contrast starkly with the red. A subtle hint of chrome embellishes the grille and exhaust, while the tinted windows suggest a luxurious and private interior. The scene conveys a sense of speed and elegance, the car appearing as if it\'s about to burst into a sprint along a coastal road, with the ocean\'s azure waves crashing in the background",
      "guidance_scale": 3.5,
      "num_inference_steps": 50
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

Output

{
  "completed_at": "2025-03-06T20:07:49.411807Z",
  "created_at": "2025-03-06T20:05:22.818000Z",
  "data_removed": false,
  "error": null,
  "id": "70hp26gb09rm80cndm8a1kkm3m",
  "input": {
    "width": 1024,
    "height": 1024,
    "prompt": "A vibrant cherry red sports car sits proudly under the gleaming sun, its polished exterior smooth and flawless, casting a mirror-like reflection. The car features a low, aerodynamic body, angular headlights that gaze forward like predatory eyes, and a set of black, high-gloss racing rims that contrast starkly with the red. A subtle hint of chrome embellishes the grille and exhaust, while the tinted windows suggest a luxurious and private interior. The scene conveys a sense of speed and elegance, the car appearing as if it's about to burst into a sprint along a coastal road, with the ocean's azure waves crashing in the background",
    "guidance_scale": 3.5,
    "num_inference_steps": 50
  },
  "logs": "Using seed:  15597637\n  0%|          | 0/50 [00:00<?, ?it/s]\n  2%|▏         | 1/50 [00:10<08:58, 10.99s/it]\n  4%|▍         | 2/50 [00:11<03:52,  4.84s/it]\n  6%|▌         | 3/50 [00:12<02:19,  2.97s/it]\n  8%|▊         | 4/50 [00:12<01:36,  2.09s/it]\n 10%|█         | 5/50 [00:13<01:12,  1.60s/it]\n 12%|█▏        | 6/50 [00:14<00:57,  1.31s/it]\n 14%|█▍        | 7/50 [00:15<00:48,  1.12s/it]\n 16%|█▌        | 8/50 [00:15<00:42,  1.00s/it]\n 18%|█▊        | 9/50 [00:16<00:37,  1.08it/s]\n 20%|██        | 10/50 [00:17<00:34,  1.15it/s]\n 22%|██▏       | 11/50 [00:18<00:32,  1.20it/s]\n 24%|██▍       | 12/50 [00:18<00:30,  1.24it/s]\n 26%|██▌       | 13/50 [00:19<00:29,  1.27it/s]\n 28%|██▊       | 14/50 [00:20<00:28,  1.28it/s]\n 30%|███       | 15/50 [00:21<00:27,  1.28it/s]\n 32%|███▏      | 16/50 [00:22<00:26,  1.29it/s]\n 34%|███▍      | 17/50 [00:22<00:25,  1.29it/s]\n 36%|███▌      | 18/50 [00:23<00:24,  1.29it/s]\n 38%|███▊      | 19/50 [00:24<00:24,  1.29it/s]\n 40%|████      | 20/50 [00:25<00:23,  1.29it/s]\n 42%|████▏     | 21/50 [00:25<00:22,  1.29it/s]\n 44%|████▍     | 22/50 [00:26<00:21,  1.29it/s]\n 46%|████▌     | 23/50 [00:27<00:20,  1.29it/s]\n 48%|████▊     | 24/50 [00:28<00:20,  1.29it/s]\n 50%|█████     | 25/50 [00:28<00:19,  1.29it/s]\n 52%|█████▏    | 26/50 [00:29<00:18,  1.29it/s]\n 54%|█████▍    | 27/50 [00:30<00:17,  1.29it/s]\n 56%|█████▌    | 28/50 [00:31<00:17,  1.29it/s]\n 58%|█████▊    | 29/50 [00:32<00:16,  1.29it/s]\n 60%|██████    | 30/50 [00:32<00:15,  1.29it/s]\n 62%|██████▏   | 31/50 [00:33<00:14,  1.29it/s]\n 64%|██████▍   | 32/50 [00:34<00:14,  1.29it/s]\n 66%|██████▌   | 33/50 [00:35<00:13,  1.28it/s]\n 68%|██████▊   | 34/50 [00:35<00:12,  1.28it/s]\n 70%|███████   | 35/50 [00:36<00:11,  1.28it/s]\n 72%|███████▏  | 36/50 [00:37<00:10,  1.28it/s]\n 74%|███████▍  | 37/50 [00:38<00:10,  1.28it/s]\n 76%|███████▌  | 38/50 [00:39<00:09,  1.28it/s]\n 78%|███████▊  | 39/50 [00:39<00:08,  1.28it/s]\n 80%|████████  | 40/50 [00:40<00:07,  1.28it/s]\n 82%|████████▏ | 41/50 [00:41<00:07,  1.28it/s]\n 84%|████████▍ | 42/50 [00:42<00:06,  1.28it/s]\n 86%|████████▌ | 43/50 [00:42<00:05,  1.28it/s]\n 88%|████████▊ | 44/50 [00:43<00:04,  1.28it/s]\n 90%|█████████ | 45/50 [00:44<00:03,  1.28it/s]\n 92%|█████████▏| 46/50 [00:45<00:03,  1.28it/s]\n 94%|█████████▍| 47/50 [00:46<00:02,  1.28it/s]\n 96%|█████████▌| 48/50 [00:46<00:01,  1.28it/s]\n 98%|█████████▊| 49/50 [00:47<00:00,  1.28it/s]\n100%|██████████| 50/50 [00:48<00:00,  1.28it/s]\n100%|██████████| 50/50 [00:48<00:00,  1.03it/s]",
  "metrics": {
    "predict_time": 58.84104651,
    "total_time": 146.593807
  },
  "output": "https://replicate.delivery/xezq/LAoxeRLX7A2EGCMuSdaaVmiJZw5KYj5eguMih1X01YuVK7VUA/output.png",
  "started_at": "2025-03-06T20:06:50.570760Z",
  "status": "succeeded",
  "urls": {
    "stream": "https://stream.replicate.com/v1/files/bcwr-hugokczqfa5u33r2yxkh6rbgsacu3huipmz54ikukawawzw5bdxa",
    "get": "https://api.replicate.com/v1/predictions/70hp26gb09rm80cndm8a1kkm3m",
    "cancel": "https://api.replicate.com/v1/predictions/70hp26gb09rm80cndm8a1kkm3m/cancel"
  },
  "version": "5b608630958be05f6845ed0d629f56cd01c372017d42f4e26916a12c7eab7b62"
}

Generated in

58.9 seconds

Tweak it ShareReport View full prediction

Using seed:  15597637
  0%|          | 0/50 [00:00<?, ?it/s]
  2%|▏         | 1/50 [00:10<08:58, 10.99s/it]
  4%|▍         | 2/50 [00:11<03:52,  4.84s/it]
  6%|▌         | 3/50 [00:12<02:19,  2.97s/it]
  8%|▊         | 4/50 [00:12<01:36,  2.09s/it]
 10%|█         | 5/50 [00:13<01:12,  1.60s/it]
 12%|█▏        | 6/50 [00:14<00:57,  1.31s/it]
 14%|█▍        | 7/50 [00:15<00:48,  1.12s/it]
 16%|█▌        | 8/50 [00:15<00:42,  1.00s/it]
 18%|█▊        | 9/50 [00:16<00:37,  1.08it/s]
 20%|██        | 10/50 [00:17<00:34,  1.15it/s]
 22%|██▏       | 11/50 [00:18<00:32,  1.20it/s]
 24%|██▍       | 12/50 [00:18<00:30,  1.24it/s]
 26%|██▌       | 13/50 [00:19<00:29,  1.27it/s]
 28%|██▊       | 14/50 [00:20<00:28,  1.28it/s]
 30%|███       | 15/50 [00:21<00:27,  1.28it/s]
 32%|███▏      | 16/50 [00:22<00:26,  1.29it/s]
 34%|███▍      | 17/50 [00:22<00:25,  1.29it/s]
 36%|███▌      | 18/50 [00:23<00:24,  1.29it/s]
 38%|███▊      | 19/50 [00:24<00:24,  1.29it/s]
 40%|████      | 20/50 [00:25<00:23,  1.29it/s]
 42%|████▏     | 21/50 [00:25<00:22,  1.29it/s]
 44%|████▍     | 22/50 [00:26<00:21,  1.29it/s]
 46%|████▌     | 23/50 [00:27<00:20,  1.29it/s]
 48%|████▊     | 24/50 [00:28<00:20,  1.29it/s]
 50%|█████     | 25/50 [00:28<00:19,  1.29it/s]
 52%|█████▏    | 26/50 [00:29<00:18,  1.29it/s]
 54%|█████▍    | 27/50 [00:30<00:17,  1.29it/s]
 56%|█████▌    | 28/50 [00:31<00:17,  1.29it/s]
 58%|█████▊    | 29/50 [00:32<00:16,  1.29it/s]
 60%|██████    | 30/50 [00:32<00:15,  1.29it/s]
 62%|██████▏   | 31/50 [00:33<00:14,  1.29it/s]
 64%|██████▍   | 32/50 [00:34<00:14,  1.29it/s]
 66%|██████▌   | 33/50 [00:35<00:13,  1.28it/s]
 68%|██████▊   | 34/50 [00:35<00:12,  1.28it/s]
 70%|███████   | 35/50 [00:36<00:11,  1.28it/s]
 72%|███████▏  | 36/50 [00:37<00:10,  1.28it/s]
 74%|███████▍  | 37/50 [00:38<00:10,  1.28it/s]
 76%|███████▌  | 38/50 [00:39<00:09,  1.28it/s]
 78%|███████▊  | 39/50 [00:39<00:08,  1.28it/s]
 80%|████████  | 40/50 [00:40<00:07,  1.28it/s]
 82%|████████▏ | 41/50 [00:41<00:07,  1.28it/s]
 84%|████████▍ | 42/50 [00:42<00:06,  1.28it/s]
 86%|████████▌ | 43/50 [00:42<00:05,  1.28it/s]
 88%|████████▊ | 44/50 [00:43<00:04,  1.28it/s]
 90%|█████████ | 45/50 [00:44<00:03,  1.28it/s]
 92%|█████████▏| 46/50 [00:45<00:03,  1.28it/s]
 94%|█████████▍| 47/50 [00:46<00:02,  1.28it/s]
 96%|█████████▌| 48/50 [00:46<00:01,  1.28it/s]
 98%|█████████▊| 49/50 [00:47<00:00,  1.28it/s]
100%|██████████| 50/50 [00:48<00:00,  1.28it/s]
100%|██████████| 50/50 [00:48<00:00,  1.03it/s]

Examples

View more examples

Run time and cost

This model runs on Nvidia L40S GPU hardware. We don't yet have enough runs of this model to provide performance information.

Readme

CogView4-6B

🤗 Space | 🌐 Github | 📜 arxiv

Inference Requirements and Model Introduction

Resolution: Width and height must be between 512px and 2048px, divisible by 32, and ensure the maximum number of pixels does not exceed 2^21 px.
Precision: BF16 / FP32 (FP16 is not supported as it will cause overflow resulting in completely black images)

Using BF16 precision with batchsize=4 for testing, the memory usage is shown in the table below:

Resolution	enable_model_cpu_offload OFF	enable_model_cpu_offload ON	enable_model_cpu_offload ON Text Encoder 4bit
512 * 512	33GB	20GB	13G
1280 * 720	35GB	20GB	13G
1024 * 1024	35GB	20GB	13G
1920 * 1280	39GB	20GB	14G
2048 * 2048	43GB	21GB	14G

Model Metrics

We’ve tested on multiple benchmarks and achieved the following scores:

DPG-Bench

Model	Overall	Global	Entity	Attribute	Relation	Other
SDXL	74.65	83.27	82.43	80.91	86.76	80.41
PixArt-alpha	71.11	74.97	79.32	78.60	82.57	76.96
SD3-Medium	84.08	87.90	91.01	88.83	80.70	88.68
DALL-E 3	83.50	90.97	89.61	88.39	90.58	89.83
Flux.1-dev	83.79	85.80	86.79	89.98	90.04	89.90
Janus-Pro-7B	84.19	86.90	88.90	89.40	89.32	89.48
CogView4-6B	85.13	83.85	90.35	91.17	91.14	87.29

GenEval

Model	Overall	Single Obj.	Two Obj.	Counting	Colors	Position	Color attribution
SDXL	0.55	0.98	0.74	0.39	0.85	0.15	0.23
PixArt-alpha	0.48	0.98	0.50	0.44	0.80	0.08	0.07
SD3-Medium	0.74	0.99	0.94	0.72	0.89	0.33	0.60
DALL-E 3	0.67	0.96	0.87	0.47	0.83	0.43	0.45
Flux.1-dev	0.66	0.98	0.79	0.73	0.77	0.22	0.45
Janus-Pro-7B	0.80	0.99	0.89	0.59	0.90	0.79	0.66
CogView4-6B	0.73	0.99	0.86	0.66	0.79	0.48	0.58

T2I-CompBench

Model	Color	Shape	Texture	2D-Spatial	3D-Spatial	Numeracy	Non-spatial Clip	Complex 3-in-1
SDXL	0.5879	0.4687	0.5299	0.2133	0.3566	0.4988	0.3119	0.3237
PixArt-alpha	0.6690	0.4927	0.6477	0.2064	0.3901	0.5058	0.3197	0.3433
SD3-Medium	0.8132	0.5885	0.7334	0.3200	0.4084	0.6174	0.3140	0.3771
DALL-E 3	0.7785	0.6205	0.7036	0.2865	0.3744	0.5880	0.3003	0.3773
Flux.1-dev	0.7572	0.5066	0.6300	0.2700	0.3992	0.6165	0.3065	0.3628
Janus-Pro-7B	0.5145	0.3323	0.4069	0.1566	0.2753	0.4406	0.3137	0.3806
CogView4-6B	0.7786	0.5880	0.6983	0.3075	0.3708	0.6626	0.3056	0.3869

Chinese Text Accuracy Evaluation

Model	Precision	Recall	F1 Score	Pick@4
Kolors	0.6094	0.1886	0.2880	0.1633
CogView4-6B	0.6969	0.5532	0.6168	0.3265

Citation

🌟 If you find our work helpful, please consider citing our paper and leaving valuable stars

@article{zheng2024cogview3,
  title={Cogview3: Finer and faster text-to-image generation via relay diffusion},
  author={Zheng, Wendi and Teng, Jiayan and Yang, Zhuoyi and Wang, Weihan and Chen, Jidong and Gu, Xiaotao and Dong, Yuxiao and Ding, Ming and Tang, Jie},
  journal={arXiv preprint arXiv:2403.05121},
  year={2024}
}

License

This model is released under the Apache 2.0 License.