vectorspacelab / omnigen

OmniGen: Unified Image Generation

Cold

Public
12.5K runs
L40S
GitHub
Weights
Paper
License

Iterate in playground

Run with an API

Playground API Examples README Versions

Input

prompt

string

Shift + Return to add a new line

<img><|image_1|><img> Remove the woman's earrings. Replace the mug with a clear glass filled with sparkling iced cola.<img><|image_1|><img> Remove the woman's earrings. Replace the mug with a clear glass filled with sparkling iced cola.

Input prompt. For multi-modal to image generation with one or more input images, the placeholder in the prompt should be in the format of <img><|image_*|></img> (for the first image, the placeholder is <|image_1|>, for the second image, the the placeholder is <|image_2|>). Refer to examples for more details

Default: "a photo of an astronaut riding a horse on mars"

img1

file

Preview

Input image 1. Optional

img2

file

Input image 2. Optional

img3

file

Input image 3. Optional

width

integer

(minimum: 128, maximum: 2048)

Width of the output image

Default: 1024

height

integer

(minimum: 128, maximum: 2048)

Height of the output image

Default: 1024

inference_steps

integer

(minimum: 1, maximum: 100)

Number of denoising steps

Default: 50

guidance_scale

number

(minimum: 1, maximum: 5)

Classifier-free guidance scale for text prompt

Default: 2.5

img_guidance_scale

number

(minimum: 1, maximum: 2)

Classifier-free guidance scale for images

Default: 1.6

seed

integer

Random seed. Leave blank to randomize the seed

max_input_image_size

integer

(minimum: 128, maximum: 2048)

maximum input image size

Default: 1024

separate_cfg_infer

boolean

Whether to use separate inference process for different guidance. This will reduce the memory cost.

Default: true

offload_model

boolean

Offload model to CPU, which will significantly reduce the memory cost but slow down the generation speed. You can cancel separate_cfg_infer and set offload_model=True. If both separate_cfg_infer and offload_model are True, further reduce the memory, but slowest generation

Default: false

use_input_image_size_as_output

boolean

Automatically adjust the output image size to be same as input image size. For editing and controlnet task, it can make sure the output image has the same size as input image leading to better performance

Default: false

Run this model in Node.js with one line of code:

npx create-replicate --model=vectorspacelab/omnigen

or set up a project from scratch

Install Replicate’s Node.js client library:

npm install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import and set up the client:

import Replicate from "replicate";
import fs from "node:fs";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run vectorspacelab/omnigen using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "vectorspacelab/omnigen:af66691a8952a0ce21b26e840835ad1efe176af159e10169ec5df6916338863b",
  {
    input: {
      img1: "https://shitao-omnigen.hf.space/gradio_api/file=/tmp/gradio/23e02261d0d5dd416702fc5645b5192af904ddd5d83e6953913720c625b40306/t2i_woman_with_book.png",
      width: 1024,
      height: 1024,
      prompt: "<img><|image_1|><img> Remove the woman's earrings. Replace the mug with a clear glass filled with sparkling iced cola.",
      offload_model: false,
      guidance_scale: 2.5,
      inference_steps: 50,
      img_guidance_scale: 1.6,
      separate_cfg_infer: true,
      max_input_image_size: 1024,
      use_input_image_size_as_output: true
    }
  }
);

// To access the file URL:
console.log(output.url()); //=> "http://example.com"

// To write the file to disk:
fs.writeFile("my-image.png", output);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import the client:

import replicate

Run vectorspacelab/omnigen using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "vectorspacelab/omnigen:af66691a8952a0ce21b26e840835ad1efe176af159e10169ec5df6916338863b",
    input={
        "img1": "https://shitao-omnigen.hf.space/gradio_api/file=/tmp/gradio/23e02261d0d5dd416702fc5645b5192af904ddd5d83e6953913720c625b40306/t2i_woman_with_book.png",
        "width": 1024,
        "height": 1024,
        "prompt": "<img><|image_1|><img> Remove the woman's earrings. Replace the mug with a clear glass filled with sparkling iced cola.",
        "offload_model": False,
        "guidance_scale": 2.5,
        "inference_steps": 50,
        "img_guidance_scale": 1.6,
        "separate_cfg_infer": True,
        "max_input_image_size": 1024,
        "use_input_image_size_as_output": True
    }
)

# To access the file URL:
print(output.url())
#=> "http://example.com"

# To write the file to disk:
with open("my-image.png", "wb") as file:
    file.write(output.read())

To learn more, take a look at the guide on getting started with Python.

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Run vectorspacelab/omnigen using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "vectorspacelab/omnigen:af66691a8952a0ce21b26e840835ad1efe176af159e10169ec5df6916338863b",
    "input": {
      "img1": "https://shitao-omnigen.hf.space/gradio_api/file=/tmp/gradio/23e02261d0d5dd416702fc5645b5192af904ddd5d83e6953913720c625b40306/t2i_woman_with_book.png",
      "width": 1024,
      "height": 1024,
      "prompt": "<img><|image_1|><img> Remove the woman\'s earrings. Replace the mug with a clear glass filled with sparkling iced cola.",
      "offload_model": false,
      "guidance_scale": 2.5,
      "inference_steps": 50,
      "img_guidance_scale": 1.6,
      "separate_cfg_infer": true,
      "max_input_image_size": 1024,
      "use_input_image_size_as_output": true
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

Output

{
  "completed_at": "2024-11-03T22:52:27.127618Z",
  "created_at": "2024-11-03T22:48:00.229000Z",
  "data_removed": false,
  "error": null,
  "id": "8ahxgzxswnrgg0cjygvat764fc",
  "input": {
    "img1": "https://shitao-omnigen.hf.space/gradio_api/file=/tmp/gradio/23e02261d0d5dd416702fc5645b5192af904ddd5d83e6953913720c625b40306/t2i_woman_with_book.png",
    "width": 1024,
    "height": 1024,
    "prompt": "<img><|image_1|><img> Remove the woman's earrings. Replace the mug with a clear glass filled with sparkling iced cola.",
    "offload_model": false,
    "guidance_scale": 2.5,
    "inference_steps": 50,
    "img_guidance_scale": 1.6,
    "separate_cfg_infer": true,
    "max_input_image_size": 1024,
    "use_input_image_size_as_output": true
  },
  "logs": "Using seed: 5102\n  0%|          | 0/50 [00:00<?, ?it/s]\n  2%|▏         | 1/50 [00:05<04:05,  5.01s/it]\n  4%|▍         | 2/50 [00:07<02:47,  3.49s/it]\n  6%|▌         | 3/50 [00:09<02:19,  2.96s/it]\n  8%|▊         | 4/50 [00:12<02:04,  2.71s/it]\n 10%|█         | 5/50 [00:14<01:55,  2.57s/it]\n 12%|█▏        | 6/50 [00:16<01:49,  2.48s/it]\n 14%|█▍        | 7/50 [00:19<01:44,  2.43s/it]\n 16%|█▌        | 8/50 [00:21<01:40,  2.40s/it]\n 18%|█▊        | 9/50 [00:23<01:37,  2.38s/it]\n 20%|██        | 10/50 [00:26<01:34,  2.36s/it]\n 22%|██▏       | 11/50 [00:28<01:31,  2.36s/it]\n 24%|██▍       | 12/50 [00:30<01:29,  2.36s/it]\n 26%|██▌       | 13/50 [00:33<01:27,  2.37s/it]\n 28%|██▊       | 14/50 [00:35<01:25,  2.37s/it]\n 30%|███       | 15/50 [00:37<01:22,  2.36s/it]\n 32%|███▏      | 16/50 [00:40<01:20,  2.36s/it]\n 34%|███▍      | 17/50 [00:42<01:17,  2.35s/it]\n 36%|███▌      | 18/50 [00:44<01:15,  2.35s/it]\n 38%|███▊      | 19/50 [00:47<01:12,  2.34s/it]\n 40%|████      | 20/50 [00:49<01:09,  2.33s/it]\n 42%|████▏     | 21/50 [00:51<01:07,  2.33s/it]\n 44%|████▍     | 22/50 [00:54<01:05,  2.32s/it]\n 46%|████▌     | 23/50 [00:56<01:02,  2.32s/it]\n 48%|████▊     | 24/50 [00:58<01:00,  2.32s/it]\n 50%|█████     | 25/50 [01:01<00:57,  2.32s/it]\n 52%|█████▏    | 26/50 [01:03<00:55,  2.32s/it]\n 54%|█████▍    | 27/50 [01:05<00:53,  2.32s/it]\n 56%|█████▌    | 28/50 [01:08<00:50,  2.32s/it]\n 58%|█████▊    | 29/50 [01:10<00:48,  2.32s/it]\n 60%|██████    | 30/50 [01:12<00:46,  2.32s/it]\n 62%|██████▏   | 31/50 [01:14<00:43,  2.32s/it]\n 64%|██████▍   | 32/50 [01:17<00:41,  2.32s/it]\n 66%|██████▌   | 33/50 [01:19<00:39,  2.32s/it]\n 68%|██████▊   | 34/50 [01:21<00:37,  2.32s/it]\n 70%|███████   | 35/50 [01:24<00:34,  2.32s/it]\n 72%|███████▏  | 36/50 [01:26<00:32,  2.32s/it]\n 74%|███████▍  | 37/50 [01:28<00:30,  2.32s/it]\n 76%|███████▌  | 38/50 [01:31<00:27,  2.32s/it]\n 78%|███████▊  | 39/50 [01:33<00:25,  2.33s/it]\n 80%|████████  | 40/50 [01:35<00:23,  2.34s/it]\n 82%|████████▏ | 41/50 [01:38<00:21,  2.34s/it]\n 84%|████████▍ | 42/50 [01:40<00:18,  2.34s/it]\n 86%|████████▌ | 43/50 [01:42<00:16,  2.34s/it]\n 88%|████████▊ | 44/50 [01:45<00:14,  2.36s/it]\n 90%|█████████ | 45/50 [01:47<00:11,  2.38s/it]\n 92%|█████████▏| 46/50 [01:50<00:09,  2.40s/it]\n 94%|█████████▍| 47/50 [01:52<00:07,  2.42s/it]\n 96%|█████████▌| 48/50 [01:55<00:04,  2.44s/it]\n 98%|█████████▊| 49/50 [01:57<00:02,  2.44s/it]\n100%|██████████| 50/50 [01:59<00:00,  2.42s/it]\n100%|██████████| 50/50 [01:59<00:00,  2.40s/it]",
  "metrics": {
    "predict_time": 125.613674649,
    "total_time": 266.898618
  },
  "output": "https://replicate.delivery/pbxt/Yqaueh60432MPqWU6pUdatZe7mHOvXrjGKek8K6rLIeoKs1OB/out.png",
  "started_at": "2024-11-03T22:50:21.513944Z",
  "status": "succeeded",
  "urls": {
    "get": "https://api.replicate.com/v1/predictions/8ahxgzxswnrgg0cjygvat764fc",
    "cancel": "https://api.replicate.com/v1/predictions/8ahxgzxswnrgg0cjygvat764fc/cancel"
  },
  "version": "af66691a8952a0ce21b26e840835ad1efe176af159e10169ec5df6916338863b"
}

Generated in

2 minutes 6 seconds

Tweak it ShareReport View full prediction

Using seed: 5102
  0%|          | 0/50 [00:00<?, ?it/s]
  2%|▏         | 1/50 [00:05<04:05,  5.01s/it]
  4%|▍         | 2/50 [00:07<02:47,  3.49s/it]
  6%|▌         | 3/50 [00:09<02:19,  2.96s/it]
  8%|▊         | 4/50 [00:12<02:04,  2.71s/it]
 10%|█         | 5/50 [00:14<01:55,  2.57s/it]
 12%|█▏        | 6/50 [00:16<01:49,  2.48s/it]
 14%|█▍        | 7/50 [00:19<01:44,  2.43s/it]
 16%|█▌        | 8/50 [00:21<01:40,  2.40s/it]
 18%|█▊        | 9/50 [00:23<01:37,  2.38s/it]
 20%|██        | 10/50 [00:26<01:34,  2.36s/it]
 22%|██▏       | 11/50 [00:28<01:31,  2.36s/it]
 24%|██▍       | 12/50 [00:30<01:29,  2.36s/it]
 26%|██▌       | 13/50 [00:33<01:27,  2.37s/it]
 28%|██▊       | 14/50 [00:35<01:25,  2.37s/it]
 30%|███       | 15/50 [00:37<01:22,  2.36s/it]
 32%|███▏      | 16/50 [00:40<01:20,  2.36s/it]
 34%|███▍      | 17/50 [00:42<01:17,  2.35s/it]
 36%|███▌      | 18/50 [00:44<01:15,  2.35s/it]
 38%|███▊      | 19/50 [00:47<01:12,  2.34s/it]
 40%|████      | 20/50 [00:49<01:09,  2.33s/it]
 42%|████▏     | 21/50 [00:51<01:07,  2.33s/it]
 44%|████▍     | 22/50 [00:54<01:05,  2.32s/it]
 46%|████▌     | 23/50 [00:56<01:02,  2.32s/it]
 48%|████▊     | 24/50 [00:58<01:00,  2.32s/it]
 50%|█████     | 25/50 [01:01<00:57,  2.32s/it]
 52%|█████▏    | 26/50 [01:03<00:55,  2.32s/it]
 54%|█████▍    | 27/50 [01:05<00:53,  2.32s/it]
 56%|█████▌    | 28/50 [01:08<00:50,  2.32s/it]
 58%|█████▊    | 29/50 [01:10<00:48,  2.32s/it]
 60%|██████    | 30/50 [01:12<00:46,  2.32s/it]
 62%|██████▏   | 31/50 [01:14<00:43,  2.32s/it]
 64%|██████▍   | 32/50 [01:17<00:41,  2.32s/it]
 66%|██████▌   | 33/50 [01:19<00:39,  2.32s/it]
 68%|██████▊   | 34/50 [01:21<00:37,  2.32s/it]
 70%|███████   | 35/50 [01:24<00:34,  2.32s/it]
 72%|███████▏  | 36/50 [01:26<00:32,  2.32s/it]
 74%|███████▍  | 37/50 [01:28<00:30,  2.32s/it]
 76%|███████▌  | 38/50 [01:31<00:27,  2.32s/it]
 78%|███████▊  | 39/50 [01:33<00:25,  2.33s/it]
 80%|████████  | 40/50 [01:35<00:23,  2.34s/it]
 82%|████████▏ | 41/50 [01:38<00:21,  2.34s/it]
 84%|████████▍ | 42/50 [01:40<00:18,  2.34s/it]
 86%|████████▌ | 43/50 [01:42<00:16,  2.34s/it]
 88%|████████▊ | 44/50 [01:45<00:14,  2.36s/it]
 90%|█████████ | 45/50 [01:47<00:11,  2.38s/it]
 92%|█████████▏| 46/50 [01:50<00:09,  2.40s/it]
 94%|█████████▍| 47/50 [01:52<00:07,  2.42s/it]
 96%|█████████▌| 48/50 [01:55<00:04,  2.44s/it]
 98%|█████████▊| 49/50 [01:57<00:02,  2.44s/it]
100%|██████████| 50/50 [01:59<00:00,  2.42s/it]
100%|██████████| 50/50 [01:59<00:00,  2.40s/it]

Examples

View more examples

Run time and cost

This model costs approximately $0.060 to run on Replicate, or 16 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia L40S GPU hardware. Predictions typically complete within 62 seconds. The predict time for this model varies significantly based on the inputs.

Readme

OmniGen: Unified Image Generation

OmniGen is a unified image generation model that can generate a wide range of images from multi-modal prompts. It is designed to be simple, flexible, and easy to use. We provide inference code so that everyone can explore more functionalities of OmniGen.

Existing image generation models often require loading several additional network modules (such as ControlNet, IP-Adapter, Reference-Net, etc.) and performing extra preprocessing steps (e.g., face detection, pose estimation, cropping, etc.) to generate a satisfactory image. However, we believe that the future image generation paradigm should be more simple and flexible, that is, generating various images directly through arbitrarily multi-modal instructions without the need for additional plugins and operations, similar to how GPT works in language generation.

Due to the limited resources, OmniGen still has room for improvement. We will continue to optimize it, and hope it inspires more universal image-generation models. You can also easily fine-tune OmniGen without worrying about designing networks for specific tasks; you just need to prepare the corresponding data, and then run the script. Imagination is no longer limited; everyone can construct any image-generation task, and perhaps we can achieve very interesting, wonderful, and creative things.

If you have any questions, ideas, or interesting tasks you want OmniGen to accomplish, feel free to discuss with us: 2906698981@qq.com, wangyueze@tju.edu.cn, zhengliu1026@gmail.com. We welcome any feedback to help us improve the model.

License

This repo is licensed under the MIT License.

Citation

If you find this repository useful, please consider giving a star ⭐ and citation

@article{xiao2024omnigen,
  title={Omnigen: Unified image generation},
  author={Xiao, Shitao and Wang, Yueze and Zhou, Junjie and Yuan, Huaying and Xing, Xingrun and Yan, Ruiran and Wang, Shuting and Huang, Tiejun and Liu, Zheng},
  journal={arXiv preprint arXiv:2409.11340},
  year={2024}
}