vectorspacelab / omnigen

OmniGen: Unified Image Generation

  • Public
  • 11.2K runs
  • L40S
  • GitHub
  • Weights
  • Paper
  • License

Input

string
Shift + Return to add a new line

Input prompt. For multi-modal to image generation with one or more input images, the placeholder in the prompt should be in the format of <img><|image_*|></img> (for the first image, the placeholder is <|image_1|>, for the second image, the the placeholder is <|image_2|>). Refer to examples for more details

Default: "a photo of an astronaut riding a horse on mars"

file
Preview
img1

Input image 1. Optional

file

Input image 2. Optional

file

Input image 3. Optional

integer
(minimum: 128, maximum: 2048)

Width of the output image

Default: 1024

integer
(minimum: 128, maximum: 2048)

Height of the output image

Default: 1024

integer
(minimum: 1, maximum: 100)

Number of denoising steps

Default: 50

number
(minimum: 1, maximum: 5)

Classifier-free guidance scale for text prompt

Default: 2.5

number
(minimum: 1, maximum: 2)

Classifier-free guidance scale for images

Default: 1.6

integer

Random seed. Leave blank to randomize the seed

integer
(minimum: 128, maximum: 2048)

maximum input image size

Default: 1024

boolean

Whether to use separate inference process for different guidance. This will reduce the memory cost.

Default: true

boolean

Offload model to CPU, which will significantly reduce the memory cost but slow down the generation speed. You can cancel separate_cfg_infer and set offload_model=True. If both separate_cfg_infer and offload_model are True, further reduce the memory, but slowest generation

Default: false

boolean

Automatically adjust the output image size to be same as input image size. For editing and controlnet task, it can make sure the output image has the same size as input image leading to better performance

Default: false

Output

output
Generated in

Run time and cost

This model costs approximately $0.10 to run on Replicate, or 10 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia L40S GPU hardware. Predictions typically complete within 104 seconds. The predict time for this model varies significantly based on the inputs.

Readme

OmniGen: Unified Image Generation

OmniGen is a unified image generation model that can generate a wide range of images from multi-modal prompts. It is designed to be simple, flexible, and easy to use. We provide inference code so that everyone can explore more functionalities of OmniGen.

Existing image generation models often require loading several additional network modules (such as ControlNet, IP-Adapter, Reference-Net, etc.) and performing extra preprocessing steps (e.g., face detection, pose estimation, cropping, etc.) to generate a satisfactory image. However, we believe that the future image generation paradigm should be more simple and flexible, that is, generating various images directly through arbitrarily multi-modal instructions without the need for additional plugins and operations, similar to how GPT works in language generation.

Due to the limited resources, OmniGen still has room for improvement. We will continue to optimize it, and hope it inspires more universal image-generation models. You can also easily fine-tune OmniGen without worrying about designing networks for specific tasks; you just need to prepare the corresponding data, and then run the script. Imagination is no longer limited; everyone can construct any image-generation task, and perhaps we can achieve very interesting, wonderful, and creative things.

If you have any questions, ideas, or interesting tasks you want OmniGen to accomplish, feel free to discuss with us: 2906698981@qq.com, wangyueze@tju.edu.cn, zhengliu1026@gmail.com. We welcome any feedback to help us improve the model.

License

This repo is licensed under the MIT License.

Citation

If you find this repository useful, please consider giving a star ⭐ and citation

@article{xiao2024omnigen,
  title={Omnigen: Unified image generation},
  author={Xiao, Shitao and Wang, Yueze and Zhou, Junjie and Yuan, Huaying and Xing, Xingrun and Yan, Ruiran and Wang, Shuting and Huang, Tiejun and Liu, Zheng},
  journal={arXiv preprint arXiv:2409.11340},
  year={2024}
}