jhorovitz / omini-schnell

Place items in a scene without needing to train on them

  • Public
  • 2.4K runs
  • A100 (80GB)
  • Paper

OminiControl - Subject Control for Diffusion Models

A minimal implementation for incorporating subject-specific control into pretrained Diffusion Transformer (DiT) models, focusing on preserving subject identity while generating new views and contexts.

Key Features

  • Lightweight control mechanism requiring only 0.1% additional parameters
  • Preserves subject identity and characteristics while allowing flexible pose/scene changes
  • Built for DiT-based models (tested on FLUX.1)
  • Simple integration using multi-modal attention rather than complex control modules

Training Data

The model is trained on Subjects200K, a dataset of 200,000+ paired images showing the same subject in different contexts. Each pair maintains consistent subject identity while varying:

  • Pose/angle
  • Lighting conditions
  • Background/environment
  • Context/scene

Limitations

  • Works best with clearly defined subjects/objects
  • Requires high-quality reference images
  • Performance may vary based on subject complexity

Citation

@article{tan2024ominicontrol,
  title={OminiControl: Minimal and Universal Control for Diffusion Transformer},
  author={Tan, Zhenxiong and Liu, Songhua and Yang, Xingyi and Xue, Qiaochu and Wang, Xinchao},
  journal={arXiv preprint arXiv:2411.15098},
  year={2024}
}

For more details on the full OminiControl framework and other control capabilities, please refer to the original paper.