chenxwh / nova-t2i

Autoregressive Image Generation without Vector Quantization (Updated 5 months, 3 weeks ago)

Cold

Public
15 runs
L40S
GitHub
Weights
Paper
License

Iterate in playground

Run with an API

Playground API Examples README Versions

Input

prompt

string

Shift + Return to add a new line

a shiba inu wearing a beret and black turtleneck.a shiba inu wearing a beret and black turtleneck.

Input prompt

Default: "a shiba inu wearing a beret and black turtleneck."

negative_prompt

string

Shift + Return to add a new line

low quality, deformed, distorted, disfigured, fused fingers, bad anatomy, weird handlow quality, deformed, distorted, disfigured, fused fingers, bad anatomy, weird hand

Specify things to not see in the output

Default: "low quality, deformed, distorted, disfigured, fused fingers, bad anatomy, weird hand"

num_inference_steps

integer

(minimum: 1, maximum: 128)

Number of inference steps

Default: 64

num_diffusion_steps

integer

(minimum: 1, maximum: 50)

Number of diffusion steps

Default: 25

guidance_scale

number

(minimum: 1, maximum: 10)

Scale for classifier-free guidance

Default: 5

seed

integer

Random seed. Leave blank to randomize the seed

Run this model in Node.js with one line of code:

npx create-replicate --model=chenxwh/nova-t2i

or set up a project from scratch

Install Replicate’s Node.js client library:

npm install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import and set up the client:

import Replicate from "replicate";
import fs from "node:fs";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run chenxwh/nova-t2i using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "chenxwh/nova-t2i:9dbb060cfca8dc11331a76a202d1179c5018f96a84830ce4a402882215e534d3",
  {
    input: {
      prompt: "a shiba inu wearing a beret and black turtleneck.",
      guidance_scale: 5,
      negative_prompt: "low quality, deformed, distorted, disfigured, fused fingers, bad anatomy, weird hand",
      num_diffusion_steps: 25,
      num_inference_steps: 64
    }
  }
);

// To access the file URL:
console.log(output.url()); //=> "http://example.com"

// To write the file to disk:
fs.writeFile("my-image.png", output);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import the client:

import replicate

Run chenxwh/nova-t2i using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "chenxwh/nova-t2i:9dbb060cfca8dc11331a76a202d1179c5018f96a84830ce4a402882215e534d3",
    input={
        "prompt": "a shiba inu wearing a beret and black turtleneck.",
        "guidance_scale": 5,
        "negative_prompt": "low quality, deformed, distorted, disfigured, fused fingers, bad anatomy, weird hand",
        "num_diffusion_steps": 25,
        "num_inference_steps": 64
    }
)
print(output)

To learn more, take a look at the guide on getting started with Python.

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Run chenxwh/nova-t2i using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "chenxwh/nova-t2i:9dbb060cfca8dc11331a76a202d1179c5018f96a84830ce4a402882215e534d3",
    "input": {
      "prompt": "a shiba inu wearing a beret and black turtleneck.",
      "guidance_scale": 5,
      "negative_prompt": "low quality, deformed, distorted, disfigured, fused fingers, bad anatomy, weird hand",
      "num_diffusion_steps": 25,
      "num_inference_steps": 64
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

You can run this model locally using Cog. First, install Cog:

brew install cog

If you don’t have Homebrew, there are other installation options available.

Run this to download the model and run it in your local environment:

cog predict r8.im/chenxwh/nova-t2i@sha256:9dbb060cfca8dc11331a76a202d1179c5018f96a84830ce4a402882215e534d3 \
  -i 'prompt="a shiba inu wearing a beret and black turtleneck."' \
  -i 'guidance_scale=5' \
  -i 'negative_prompt="low quality, deformed, distorted, disfigured, fused fingers, bad anatomy, weird hand"' \
  -i 'num_diffusion_steps=25' \
  -i 'num_inference_steps=64'

To learn more, take a look at the Cog documentation.

Run this to download the model and run it in your local environment:

docker run -d -p 5000:5000 --gpus=all r8.im/chenxwh/nova-t2i@sha256:9dbb060cfca8dc11331a76a202d1179c5018f96a84830ce4a402882215e534d3
curl -s -X POST \
  -H "Content-Type: application/json" \
  -d $'{
    "input": {
      "prompt": "a shiba inu wearing a beret and black turtleneck.",
      "guidance_scale": 5,
      "negative_prompt": "low quality, deformed, distorted, disfigured, fused fingers, bad anatomy, weird hand",
      "num_diffusion_steps": 25,
      "num_inference_steps": 64
    }
  }' \
  http://localhost:5000/predictions

To learn more, take a look at the Cog documentation.

Output

{
  "completed_at": "2024-12-27T12:14:08.344308Z",
  "created_at": "2024-12-27T12:12:53.745000Z",
  "data_removed": false,
  "error": null,
  "id": "xn4r2sdce5rm80cm10491r5nt4",
  "input": {
    "prompt": "a shiba inu wearing a beret and black turtleneck.",
    "guidance_scale": 5,
    "negative_prompt": "low quality, deformed, distorted, disfigured, fused fingers, bad anatomy, weird hand",
    "num_diffusion_steps": 25,
    "num_inference_steps": 64
  },
  "logs": "Using seed: 20663\n  0%|          | 0/64 [00:00<?, ?it/s]\n  2%|▏         | 1/64 [00:00<00:10,  5.97it/s]\n  3%|▎         | 2/64 [00:00<00:08,  7.05it/s]\n  5%|▍         | 3/64 [00:00<00:08,  7.43it/s]\n  6%|▋         | 4/64 [00:00<00:07,  7.52it/s]\n  8%|▊         | 5/64 [00:00<00:07,  7.64it/s]\n  9%|▉         | 6/64 [00:00<00:07,  7.71it/s]\n 11%|█         | 7/64 [00:00<00:07,  7.73it/s]\n 12%|█▎        | 8/64 [00:01<00:07,  7.80it/s]\n 14%|█▍        | 9/64 [00:01<00:07,  7.77it/s]\n 16%|█▌        | 10/64 [00:01<00:06,  7.78it/s]\n 17%|█▋        | 11/64 [00:01<00:06,  7.79it/s]\n 19%|█▉        | 12/64 [00:01<00:06,  7.80it/s]\n 20%|██        | 13/64 [00:01<00:06,  7.81it/s]\n 22%|██▏       | 14/64 [00:01<00:06,  7.80it/s]\n 23%|██▎       | 15/64 [00:01<00:06,  7.79it/s]\n 25%|██▌       | 16/64 [00:02<00:06,  7.79it/s]\n 27%|██▋       | 17/64 [00:02<00:06,  7.78it/s]\n 28%|██▊       | 18/64 [00:02<00:05,  7.78it/s]\n 30%|██▉       | 19/64 [00:02<00:05,  7.75it/s]\n 31%|███▏      | 20/64 [00:02<00:05,  7.74it/s]\n 33%|███▎      | 21/64 [00:02<00:05,  7.61it/s]\n 34%|███▍      | 22/64 [00:02<00:05,  7.50it/s]\n 36%|███▌      | 23/64 [00:03<00:05,  7.43it/s]\n 38%|███▊      | 24/64 [00:03<00:05,  7.35it/s]\n 39%|███▉      | 25/64 [00:03<00:05,  7.05it/s]\n 41%|████      | 26/64 [00:03<00:05,  7.12it/s]\n 42%|████▏     | 27/64 [00:03<00:05,  7.15it/s]\n 44%|████▍     | 28/64 [00:03<00:05,  7.14it/s]\n 45%|████▌     | 29/64 [00:03<00:04,  7.05it/s]\n 47%|████▋     | 30/64 [00:04<00:04,  6.92it/s]\n 48%|████▊     | 31/64 [00:04<00:04,  6.94it/s]\n 50%|█████     | 32/64 [00:04<00:04,  6.90it/s]\n 52%|█████▏    | 33/64 [00:04<00:04,  6.86it/s]\n 53%|█████▎    | 34/64 [00:04<00:04,  6.65it/s]\n 55%|█████▍    | 35/64 [00:04<00:04,  6.69it/s]\n 56%|█████▋    | 36/64 [00:04<00:04,  6.74it/s]\n 58%|█████▊    | 37/64 [00:05<00:04,  6.73it/s]\n 59%|█████▉    | 38/64 [00:05<00:03,  6.70it/s]\n 61%|██████    | 39/64 [00:05<00:03,  6.66it/s]\n 62%|██████▎   | 40/64 [00:05<00:03,  6.46it/s]\n 64%|██████▍   | 41/64 [00:05<00:03,  6.50it/s]\n 66%|██████▌   | 42/64 [00:05<00:03,  6.51it/s]\n 67%|██████▋   | 43/64 [00:06<00:03,  6.36it/s]\n 69%|██████▉   | 44/64 [00:06<00:03,  6.28it/s]\n 70%|███████   | 45/64 [00:06<00:03,  6.20it/s]\n 72%|███████▏  | 46/64 [00:06<00:02,  6.19it/s]\n 73%|███████▎  | 47/64 [00:06<00:02,  6.14it/s]\n 75%|███████▌  | 48/64 [00:06<00:02,  6.05it/s]\n 77%|███████▋  | 49/64 [00:07<00:02,  5.94it/s]\n 78%|███████▊  | 50/64 [00:07<00:02,  5.90it/s]\n 80%|███████▉  | 51/64 [00:07<00:02,  5.84it/s]\n 81%|████████▏ | 52/64 [00:07<00:02,  5.81it/s]\n 83%|████████▎ | 53/64 [00:07<00:01,  5.77it/s]\n 84%|████████▍ | 54/64 [00:07<00:01,  5.76it/s]\n 86%|████████▌ | 55/64 [00:08<00:01,  5.70it/s]\n 88%|████████▊ | 56/64 [00:08<00:01,  5.61it/s]\n 89%|████████▉ | 57/64 [00:08<00:01,  5.55it/s]\n 91%|█████████ | 58/64 [00:08<00:01,  5.54it/s]\n 92%|█████████▏| 59/64 [00:08<00:00,  5.49it/s]\n 94%|█████████▍| 60/64 [00:08<00:00,  5.34it/s]\n 95%|█████████▌| 61/64 [00:09<00:00,  5.26it/s]\n 97%|█████████▋| 62/64 [00:09<00:00,  5.22it/s]\n 98%|█████████▊| 63/64 [00:09<00:00,  5.15it/s]\n100%|██████████| 64/64 [00:09<00:00,  5.03it/s]\n100%|██████████| 64/64 [00:09<00:00,  6.53it/s]",
  "metrics": {
    "predict_time": 10.546732175,
    "total_time": 74.599308
  },
  "output": "https://replicate.delivery/xezq/TefYpvHO8mkQcEFQQY81mzLjS3kUbdAOo8NLR8Znf06ggJePB/out.png",
  "started_at": "2024-12-27T12:13:57.797576Z",
  "status": "succeeded",
  "urls": {
    "stream": "https://stream.replicate.com/v1/files/bcwr-3phkv55ne24qhxrvzt3oiwkt4a3vng5e7uvbyrtz4pzdekzt4axq",
    "get": "https://api.replicate.com/v1/predictions/xn4r2sdce5rm80cm10491r5nt4",
    "cancel": "https://api.replicate.com/v1/predictions/xn4r2sdce5rm80cm10491r5nt4/cancel"
  },
  "version": "9dbb060cfca8dc11331a76a202d1179c5018f96a84830ce4a402882215e534d3"
}

Generated in

10.6 seconds

Tweak it ShareReport View full prediction

Using seed: 20663
  0%|          | 0/64 [00:00<?, ?it/s]
  2%|▏         | 1/64 [00:00<00:10,  5.97it/s]
  3%|▎         | 2/64 [00:00<00:08,  7.05it/s]
  5%|▍         | 3/64 [00:00<00:08,  7.43it/s]
  6%|▋         | 4/64 [00:00<00:07,  7.52it/s]
  8%|▊         | 5/64 [00:00<00:07,  7.64it/s]
  9%|▉         | 6/64 [00:00<00:07,  7.71it/s]
 11%|█         | 7/64 [00:00<00:07,  7.73it/s]
 12%|█▎        | 8/64 [00:01<00:07,  7.80it/s]
 14%|█▍        | 9/64 [00:01<00:07,  7.77it/s]
 16%|█▌        | 10/64 [00:01<00:06,  7.78it/s]
 17%|█▋        | 11/64 [00:01<00:06,  7.79it/s]
 19%|█▉        | 12/64 [00:01<00:06,  7.80it/s]
 20%|██        | 13/64 [00:01<00:06,  7.81it/s]
 22%|██▏       | 14/64 [00:01<00:06,  7.80it/s]
 23%|██▎       | 15/64 [00:01<00:06,  7.79it/s]
 25%|██▌       | 16/64 [00:02<00:06,  7.79it/s]
 27%|██▋       | 17/64 [00:02<00:06,  7.78it/s]
 28%|██▊       | 18/64 [00:02<00:05,  7.78it/s]
 30%|██▉       | 19/64 [00:02<00:05,  7.75it/s]
 31%|███▏      | 20/64 [00:02<00:05,  7.74it/s]
 33%|███▎      | 21/64 [00:02<00:05,  7.61it/s]
 34%|███▍      | 22/64 [00:02<00:05,  7.50it/s]
 36%|███▌      | 23/64 [00:03<00:05,  7.43it/s]
 38%|███▊      | 24/64 [00:03<00:05,  7.35it/s]
 39%|███▉      | 25/64 [00:03<00:05,  7.05it/s]
 41%|████      | 26/64 [00:03<00:05,  7.12it/s]
 42%|████▏     | 27/64 [00:03<00:05,  7.15it/s]
 44%|████▍     | 28/64 [00:03<00:05,  7.14it/s]
 45%|████▌     | 29/64 [00:03<00:04,  7.05it/s]
 47%|████▋     | 30/64 [00:04<00:04,  6.92it/s]
 48%|████▊     | 31/64 [00:04<00:04,  6.94it/s]
 50%|█████     | 32/64 [00:04<00:04,  6.90it/s]
 52%|█████▏    | 33/64 [00:04<00:04,  6.86it/s]
 53%|█████▎    | 34/64 [00:04<00:04,  6.65it/s]
 55%|█████▍    | 35/64 [00:04<00:04,  6.69it/s]
 56%|█████▋    | 36/64 [00:04<00:04,  6.74it/s]
 58%|█████▊    | 37/64 [00:05<00:04,  6.73it/s]
 59%|█████▉    | 38/64 [00:05<00:03,  6.70it/s]
 61%|██████    | 39/64 [00:05<00:03,  6.66it/s]
 62%|██████▎   | 40/64 [00:05<00:03,  6.46it/s]
 64%|██████▍   | 41/64 [00:05<00:03,  6.50it/s]
 66%|██████▌   | 42/64 [00:05<00:03,  6.51it/s]
 67%|██████▋   | 43/64 [00:06<00:03,  6.36it/s]
 69%|██████▉   | 44/64 [00:06<00:03,  6.28it/s]
 70%|███████   | 45/64 [00:06<00:03,  6.20it/s]
 72%|███████▏  | 46/64 [00:06<00:02,  6.19it/s]
 73%|███████▎  | 47/64 [00:06<00:02,  6.14it/s]
 75%|███████▌  | 48/64 [00:06<00:02,  6.05it/s]
 77%|███████▋  | 49/64 [00:07<00:02,  5.94it/s]
 78%|███████▊  | 50/64 [00:07<00:02,  5.90it/s]
 80%|███████▉  | 51/64 [00:07<00:02,  5.84it/s]
 81%|████████▏ | 52/64 [00:07<00:02,  5.81it/s]
 83%|████████▎ | 53/64 [00:07<00:01,  5.77it/s]
 84%|████████▍ | 54/64 [00:07<00:01,  5.76it/s]
 86%|████████▌ | 55/64 [00:08<00:01,  5.70it/s]
 88%|████████▊ | 56/64 [00:08<00:01,  5.61it/s]
 89%|████████▉ | 57/64 [00:08<00:01,  5.55it/s]
 91%|█████████ | 58/64 [00:08<00:01,  5.54it/s]
 92%|█████████▏| 59/64 [00:08<00:00,  5.49it/s]
 94%|█████████▍| 60/64 [00:08<00:00,  5.34it/s]
 95%|█████████▌| 61/64 [00:09<00:00,  5.26it/s]
 97%|█████████▋| 62/64 [00:09<00:00,  5.22it/s]
 98%|█████████▊| 63/64 [00:09<00:00,  5.15it/s]
100%|██████████| 64/64 [00:09<00:00,  5.03it/s]
100%|██████████| 64/64 [00:09<00:00,  6.53it/s]

Examples

View more examples

Run time and cost

This model runs on Nvidia L40S GPU hardware. We don't yet have enough runs of this model to provide performance information.

Readme

Autoregressive Video Generation without Vector Quantization

This is the text2image demo, see text2video demo here.

We present NOVA (NOn-Quantized Video Autoregressive Model), a model that enables autoregressive image/video generation with high efficiency. NOVA reformulates the video generation problem as non-quantized autoregressive modeling of temporal frame-by-frame prediction and spatial set-by-set prediction. NOVA generalizes well and enables diverse zero-shot generation abilities in one unified model.

✨Hightlights

🔥 Novel Approach: Non-quantized video autoregressive generation.
🔥 State-of-the-art Performance: High efficiency with state-of-the-art t2i/t2v results.
🔥 Unified Modeling: Multi-task capabilities in a single unified model.

Citation

If you find this repository useful, please consider giving a star ⭐ and citation 🦖:

@article{deng2024nova,
  title={Autoregressive Video Generation without Vector Quantization},
  author={Deng, Haoge and Pan, Ting and Diao, Haiwen and Luo, Zhengxiong and Cui, Yufeng and Lu, Huchuan and Shan, Shiguang and Qi, Yonggang and Wang, Xinlong},
  journal={arXiv preprint arXiv:2412.14169},
  year={2024}
}

Acknowledgement

We thank the repositories: MAE, MAR, MaskGIT, DiT, Open-Sora-Plan, CogVideo, and CodeWithGPU.

License

Code and models are licensed under Apache License 2.0.