OminiControl - Subject Control for Diffusion Models

A minimal implementation for incorporating subject-specific control into pretrained Diffusion Transformer (DiT) models, focusing on preserving subject identity while generating new views and contexts.

Key Features

Lightweight control mechanism requiring only 0.1% additional parameters
Preserves subject identity and characteristics while allowing flexible pose/scene changes
Built for DiT-based models (tested on FLUX.1)
Simple integration using multi-modal attention rather than complex control modules

Training Data

The model is trained on Subjects200K, a dataset of 200,000+ paired images showing the same subject in different contexts. Each pair maintains consistent subject identity while varying:

Pose/angle
Lighting conditions
Background/environment
Context/scene

Limitations

Works best with clearly defined subjects/objects
Requires high-quality reference images
Performance may vary based on subject complexity

Citation

@article{tan2024ominicontrol,
  title={OminiControl: Minimal and Universal Control for Diffusion Transformer},
  author={Tan, Zhenxiong and Liu, Songhua and Yang, Xingyi and Xue, Qiaochu and Wang, Xinchao},
  journal={arXiv preprint arXiv:2411.15098},
  year={2024}
}

For more details on the full OminiControl framework and other control capabilities, please refer to the original paper.