jhorovitz / omini-schnell

Place items in a scene without needing to train on them

  • Public
  • 2.4K runs
  • Paper

Run time and cost

This model costs approximately $0.11 to run on Replicate, or 9 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia A100 (80GB) GPU hardware. Predictions typically complete within 80 seconds. The predict time for this model varies significantly based on the inputs.

Readme

OminiControl - Subject Control for Diffusion Models

A minimal implementation for incorporating subject-specific control into pretrained Diffusion Transformer (DiT) models, focusing on preserving subject identity while generating new views and contexts.

Key Features

  • Lightweight control mechanism requiring only 0.1% additional parameters
  • Preserves subject identity and characteristics while allowing flexible pose/scene changes
  • Built for DiT-based models (tested on FLUX.1)
  • Simple integration using multi-modal attention rather than complex control modules

Training Data

The model is trained on Subjects200K, a dataset of 200,000+ paired images showing the same subject in different contexts. Each pair maintains consistent subject identity while varying:

  • Pose/angle
  • Lighting conditions
  • Background/environment
  • Context/scene

Limitations

  • Works best with clearly defined subjects/objects
  • Requires high-quality reference images
  • Performance may vary based on subject complexity

Citation

@article{tan2024ominicontrol,
  title={OminiControl: Minimal and Universal Control for Diffusion Transformer},
  author={Tan, Zhenxiong and Liu, Songhua and Yang, Xingyi and Xue, Qiaochu and Wang, Xinchao},
  journal={arXiv preprint arXiv:2411.15098},
  year={2024}
}

For more details on the full OminiControl framework and other control capabilities, please refer to the original paper.