Readme
DreamO: Unified Image Customization π¨ (Cog Implementation)
This Replicate model runs DreamO, a unified framework for image customization developed by Bytedance. It excels at tasks like subject-driven generation (IP-Adapter/PuLID style), virtual try-on, and style transfer, leveraging the FLUX.1-dev model as its backbone.
Original Project (GitHub): bytedance/DreamO arXiv Paper: 2504.16915: DreamO: A Unified Framework for Image Customization Core HF Weights: black-forest-labs/FLUX.1-dev (DreamO Pipeline) & PramaLLC/BEN2 (Background Removal)
About the DreamO Model
DreamO is a powerful image customization framework designed to handle a variety of conditioning inputs simultaneously. By leveraging VAE-based feature encoding and a novel feature routing constraint, DreamO can effectively mitigate conflicts and entanglement among multiple entities or style conditions. This allows for high-fidelity generation across different tasks such as character/object insertion (IP), face identity preservation (ID), virtual try-on, and style application.
Key Features & Capabilities β¨
- IP (Identity Preservation - General) πΌοΈ: Similar to IP-Adapter, supports a wide range of inputs including characters, objects, and animals. Achieves high fidelity in preserving entity identity.
- ID (Identity Preservation - Face) π©: Focuses specifically on facial identity, similar to InstantID and PuLID.
- Try-On ππ: Supports virtual try-on for items like tops, bottoms, glasses, and hats, even with multiple garments (a capability generalized from its training).
- Style Transfer π¨: Applies the style of a reference image to a new generation. (Note: Currently less stable than other tasks and cannot be combined with other conditions in the original implementation).
- Multi-Condition Generation β: Can combine multiple conditions (e.g., ID + IP, multiple IPs) to generate more creative and complex images, effectively managing potential conflicts between conditions.
Underlying Technologies & Concepts π¬
- FLUX Backbone: Leverages the powerful FLUX.1-dev text-to-image model. DreamO uses FLUX-turbo LoRA by default for faster inference.
- VAE-based Feature Encoding: Utilized for encoding reference images to capture high-fidelity details.
- Feature Routing Constraint: A key proposal in the DreamO paper to mitigate conflicts and entanglement when multiple conditions are applied.
Use Cases π‘
- Creating personalized avatars or character portraits with specific facial identities.
- Generating images of objects or characters in new scenes or styles.
- Virtually trying on clothing or accessories.
- Applying artistic styles from one image to another.
- Combining multiple reference subjects or styles into a single cohesive image.
Limitations β οΈ
- Style Task Stability: As noted in the original repository, style consistency is currently less stable compared to other tasks, and in the current version, style cannot be combined with other conditions.
- ID Task Nuances: While DreamO achieves high facial fidelity for ID tasks, the original paper notes it may introduce more model contamination compared to SOTA approaches like PuLID. Lowering guidance can sometimes help with βglossyβ faces.
- Resource Intensive: Requires a capable GPU (Nvidia A100 80GB on Replicate).
License & Disclaimer π
The original DreamO project is licensed under the Apache-2.0 License. See the LICENSE file in the original repository.
Disclaimer (from bytedance/DreamO): This project strives to impact the domain of AI-driven image generation positively. Users are granted the freedom to create images using this tool, but they are expected to comply with local laws and utilize it responsibly. The developers do not assume any responsibility for potential misuse by users.
This Replicate endpoint is provided for experimentation based on the original work. Users must adhere to the original license and disclaimer.
Citation π
If you find DreamO useful for your research, please consider citing their paper:
@misc{wu2025dreamo,
title={DreamO: A Unified Framework for Image Customization},
author={Yanze Wu and Yutong Feng and Difan Liu and Jiarui Sabir IARIVOAHY and Zicheng Liu and Qiang Wen and Yuedong Yang and Ming-Hsuan Yang and Chong Mou},
year={2025},
eprint={2504.16915},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Cog implementation managed by zsxkib.
Star the original repo on GitHub: bytedance/DreamO β
Follow me on Twitter/X