zsxkib / dream-o

πŸ‘—Bytedance's DreamO: unified image customization model (IP, ID, Style, Try-On, etc.)🧣

  • Public
  • 604 runs
  • A100 (80GB)
  • GitHub
  • Weights
  • Paper
  • License
Iterate in playground

Input

*string
Shift + Return to add a new line

Prompt for image generation

file
Preview
ref_image1

Reference image 1 (optional)

string

Task for reference image 1 ('ip': object/character, 'id': face identity, 'style': preserve style/background)

Default: "ip"

file
Preview
ref_image2

Reference image 2 (optional)

string

Task for reference image 2 ('ip': object/character, 'id': face identity, 'style': preserve style/background)

Default: "ip"

integer
(minimum: 768, maximum: 1024)

Width of the output image (must be multiple of 16)

Default: 1024

integer
(minimum: 768, maximum: 1024)

Height of the output image (must be multiple of 16)

Default: 1024

integer
(minimum: 8, maximum: 30)

Number of inference steps

Default: 12

number
(minimum: 1, maximum: 10)

Guidance scale. Lower for less intensity/more realism (e.g., faces), higher for stronger prompt adherence.

Default: 3.5

integer

Random seed. Leave blank or set to -1 for random.

integer
(minimum: 256, maximum: 1024)

Resolution for non-ID reference image preprocessing (target pixel area)

Default: 512

string
Shift + Return to add a new line

Negative prompt

Default: ""

number
(minimum: 1, maximum: 10)

Negative guidance scale

Default: 3.5

number
(minimum: 1, maximum: 5)

True CFG scale (advanced, requires distilled CFG LoRA)

Default: 1

integer
(minimum: 0, maximum: 30)

CFG start step (advanced)

Default: 0

integer
(minimum: 0, maximum: 30)

CFG end step (advanced)

Default: 0

number
(minimum: 0, maximum: 10)

First step guidance scale override (advanced, 0 uses main guidance)

Default: 0

string

Format of the output image

Default: "webp"

integer
(minimum: 1, maximum: 100)

Output quality for lossy formats (jpg, webp)

Default: 90

Output

output
Generated in

Run time and cost

This model costs approximately $0.33 to run on Replicate, or 3 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia A100 (80GB) GPU hardware. Predictions typically complete within 4 minutes. The predict time for this model varies significantly based on the inputs.

Readme

DreamO: Unified Image Customization 🎨 (Cog Implementation)

Replicate

This Replicate model runs DreamO, a unified framework for image customization developed by Bytedance. It excels at tasks like subject-driven generation (IP-Adapter/PuLID style), virtual try-on, and style transfer, leveraging the FLUX.1-dev model as its backbone.

Original Project (GitHub): bytedance/DreamO arXiv Paper: 2504.16915: DreamO: A Unified Framework for Image Customization Core HF Weights: black-forest-labs/FLUX.1-dev (DreamO Pipeline) & PramaLLC/BEN2 (Background Removal)


About the DreamO Model

DreamO is a powerful image customization framework designed to handle a variety of conditioning inputs simultaneously. By leveraging VAE-based feature encoding and a novel feature routing constraint, DreamO can effectively mitigate conflicts and entanglement among multiple entities or style conditions. This allows for high-fidelity generation across different tasks such as character/object insertion (IP), face identity preservation (ID), virtual try-on, and style application.

Key Features & Capabilities ✨

  • IP (Identity Preservation - General) πŸ–ΌοΈ: Similar to IP-Adapter, supports a wide range of inputs including characters, objects, and animals. Achieves high fidelity in preserving entity identity.
  • ID (Identity Preservation - Face) πŸ‘©: Focuses specifically on facial identity, similar to InstantID and PuLID.
  • Try-On πŸ‘šπŸ‘’: Supports virtual try-on for items like tops, bottoms, glasses, and hats, even with multiple garments (a capability generalized from its training).
  • Style Transfer 🎨: Applies the style of a reference image to a new generation. (Note: Currently less stable than other tasks and cannot be combined with other conditions in the original implementation).
  • Multi-Condition Generation βž•: Can combine multiple conditions (e.g., ID + IP, multiple IPs) to generate more creative and complex images, effectively managing potential conflicts between conditions.

Underlying Technologies & Concepts πŸ”¬

  • FLUX Backbone: Leverages the powerful FLUX.1-dev text-to-image model. DreamO uses FLUX-turbo LoRA by default for faster inference.
  • VAE-based Feature Encoding: Utilized for encoding reference images to capture high-fidelity details.
  • Feature Routing Constraint: A key proposal in the DreamO paper to mitigate conflicts and entanglement when multiple conditions are applied.

Use Cases πŸ’‘

  • Creating personalized avatars or character portraits with specific facial identities.
  • Generating images of objects or characters in new scenes or styles.
  • Virtually trying on clothing or accessories.
  • Applying artistic styles from one image to another.
  • Combining multiple reference subjects or styles into a single cohesive image.

Limitations ⚠️

  • Style Task Stability: As noted in the original repository, style consistency is currently less stable compared to other tasks, and in the current version, style cannot be combined with other conditions.
  • ID Task Nuances: While DreamO achieves high facial fidelity for ID tasks, the original paper notes it may introduce more model contamination compared to SOTA approaches like PuLID. Lowering guidance can sometimes help with β€œglossy” faces.
  • Resource Intensive: Requires a capable GPU (Nvidia A100 80GB on Replicate).

License & Disclaimer πŸ“œ

The original DreamO project is licensed under the Apache-2.0 License. See the LICENSE file in the original repository.

Disclaimer (from bytedance/DreamO): This project strives to impact the domain of AI-driven image generation positively. Users are granted the freedom to create images using this tool, but they are expected to comply with local laws and utilize it responsibly. The developers do not assume any responsibility for potential misuse by users.

This Replicate endpoint is provided for experimentation based on the original work. Users must adhere to the original license and disclaimer.

Citation πŸ“š

If you find DreamO useful for your research, please consider citing their paper:

@misc{wu2025dreamo,
      title={DreamO: A Unified Framework for Image Customization}, 
      author={Yanze Wu and Yutong Feng and Difan Liu and Jiarui Sabir IARIVOAHY and Zicheng Liu and Qiang Wen and Yuedong Yang and Ming-Hsuan Yang and Chong Mou},
      year={2025},
      eprint={2504.16915},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Cog implementation managed by zsxkib.

Star the original repo on GitHub: bytedance/DreamO ⭐

Follow me on Twitter/X