zsxkib / dream-o

👗Bytedance's DreamO: unified image customization model (IP, ID, Style, Try-On, etc.)🧣

Cold

Public
604 runs
A100 (80GB)
GitHub
Weights
Paper
License

Iterate in playground

Run with an API

Playground API Examples README Versions

Input

prompt

*string

Shift + Return to add a new line

the woman wearing a dress, In the banquet hallthe woman wearing a dress, In the banquet hall

Prompt for image generation

ref_image1

file

Preview

Reference image 1 (optional)

ref_task1

string

Task for reference image 1 ('ip': object/character, 'id': face identity, 'style': preserve style/background)

Default: "ip"

ref_image2

file

Preview

Reference image 2 (optional)

ref_task2

string

Task for reference image 2 ('ip': object/character, 'id': face identity, 'style': preserve style/background)

Default: "ip"

width

integer

(minimum: 768, maximum: 1024)

Width of the output image (must be multiple of 16)

Default: 1024

height

integer

(minimum: 768, maximum: 1024)

Height of the output image (must be multiple of 16)

Default: 1024

num_steps

integer

(minimum: 8, maximum: 30)

Number of inference steps

Default: 12

guidance

number

(minimum: 1, maximum: 10)

Guidance scale. Lower for less intensity/more realism (e.g., faces), higher for stronger prompt adherence.

Default: 3.5

seed

integer

Random seed. Leave blank or set to -1 for random.

ref_res

integer

(minimum: 256, maximum: 1024)

Resolution for non-ID reference image preprocessing (target pixel area)

Default: 512

neg_prompt

string

Shift + Return to add a new line

Negative prompt

Default: ""

neg_guidance

number

(minimum: 1, maximum: 10)

Negative guidance scale

Default: 3.5

true_cfg

number

(minimum: 1, maximum: 5)

True CFG scale (advanced, requires distilled CFG LoRA)

Default: 1

cfg_start_step

integer

(minimum: 0, maximum: 30)

CFG start step (advanced)

Default: 0

cfg_end_step

integer

(minimum: 0, maximum: 30)

CFG end step (advanced)

Default: 0

first_step_guidance

number

(minimum: 0, maximum: 10)

First step guidance scale override (advanced, 0 uses main guidance)

Default: 0

output_format

string

Format of the output image

Default: "webp"

output_quality

integer

(minimum: 1, maximum: 100)

Output quality for lossy formats (jpg, webp)

Default: 90

Run this model in Node.js with one line of code:

npx create-replicate --model=zsxkib/dream-o

or set up a project from scratch

Install Replicate’s Node.js client library:

npm install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import and set up the client:

import Replicate from "replicate";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run zsxkib/dream-o using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "zsxkib/dream-o:8f8d84ebe012e94a126b21c953b8dc33be86e4cf92b133b144bda94aa84e616b",
  {
    input: {
      seed: 7698454872441022000,
      width: 1024,
      height: 1024,
      prompt: "the woman wearing a dress, In the banquet hall",
      ref_res: 512,
      guidance: 3.5,
      true_cfg: 1,
      num_steps: 12,
      ref_task1: "id",
      ref_task2: "ip",
      neg_prompt: "",
      ref_image1: "https://replicate.delivery/pbxt/MzZo0OcsWW6NBl10nsCZoIRcRP9aGSczwDwVSLW5QRn8zD42/0_1.webp",
      ref_image2: "https://raw.githubusercontent.com/bytedance/DreamO/main/example_inputs/dress.png",
      cfg_end_step: 0,
      neg_guidance: 3.5,
      output_format: "webp",
      cfg_start_step: 0,
      output_quality: 90,
      first_step_guidance: 0
    }
  }
);

// To access the file URL:
console.log(output.url()); //=> "http://example.com"

// To write the file to disk:
fs.writeFile("my-image.png", output);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import the client:

import replicate

Run zsxkib/dream-o using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "zsxkib/dream-o:8f8d84ebe012e94a126b21c953b8dc33be86e4cf92b133b144bda94aa84e616b",
    input={
        "seed": 7698454872441022000,
        "width": 1024,
        "height": 1024,
        "prompt": "the woman wearing a dress, In the banquet hall",
        "ref_res": 512,
        "guidance": 3.5,
        "true_cfg": 1,
        "num_steps": 12,
        "ref_task1": "id",
        "ref_task2": "ip",
        "neg_prompt": "",
        "ref_image1": "https://replicate.delivery/pbxt/MzZo0OcsWW6NBl10nsCZoIRcRP9aGSczwDwVSLW5QRn8zD42/0_1.webp",
        "ref_image2": "https://raw.githubusercontent.com/bytedance/DreamO/main/example_inputs/dress.png",
        "cfg_end_step": 0,
        "neg_guidance": 3.5,
        "output_format": "webp",
        "cfg_start_step": 0,
        "output_quality": 90,
        "first_step_guidance": 0
    }
)
print(output)

To learn more, take a look at the guide on getting started with Python.

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Run zsxkib/dream-o using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "zsxkib/dream-o:8f8d84ebe012e94a126b21c953b8dc33be86e4cf92b133b144bda94aa84e616b",
    "input": {
      "seed": 7698454872441022000,
      "width": 1024,
      "height": 1024,
      "prompt": "the woman wearing a dress, In the banquet hall",
      "ref_res": 512,
      "guidance": 3.5,
      "true_cfg": 1,
      "num_steps": 12,
      "ref_task1": "id",
      "ref_task2": "ip",
      "neg_prompt": "",
      "ref_image1": "https://replicate.delivery/pbxt/MzZo0OcsWW6NBl10nsCZoIRcRP9aGSczwDwVSLW5QRn8zD42/0_1.webp",
      "ref_image2": "https://raw.githubusercontent.com/bytedance/DreamO/main/example_inputs/dress.png",
      "cfg_end_step": 0,
      "neg_guidance": 3.5,
      "output_format": "webp",
      "cfg_start_step": 0,
      "output_quality": 90,
      "first_step_guidance": 0
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

You can run this model locally using Cog. First, install Cog:

brew install cog

If you don’t have Homebrew, there are other installation options available.

Run this to download the model and run it in your local environment:

cog predict r8.im/zsxkib/dream-o@sha256:8f8d84ebe012e94a126b21c953b8dc33be86e4cf92b133b144bda94aa84e616b \
  -i 'seed=7698454872441022000' \
  -i 'width=1024' \
  -i 'height=1024' \
  -i 'prompt="the woman wearing a dress, In the banquet hall"' \
  -i 'ref_res=512' \
  -i 'guidance=3.5' \
  -i 'true_cfg=1' \
  -i 'num_steps=12' \
  -i 'ref_task1="id"' \
  -i 'ref_task2="ip"' \
  -i 'neg_prompt=""' \
  -i 'ref_image1="https://replicate.delivery/pbxt/MzZo0OcsWW6NBl10nsCZoIRcRP9aGSczwDwVSLW5QRn8zD42/0_1.webp"' \
  -i 'ref_image2="https://raw.githubusercontent.com/bytedance/DreamO/main/example_inputs/dress.png"' \
  -i 'cfg_end_step=0' \
  -i 'neg_guidance=3.5' \
  -i 'output_format="webp"' \
  -i 'cfg_start_step=0' \
  -i 'output_quality=90' \
  -i 'first_step_guidance=0'

To learn more, take a look at the Cog documentation.

Run this to download the model and run it in your local environment:

docker run -d -p 5000:5000 --gpus=all r8.im/zsxkib/dream-o@sha256:8f8d84ebe012e94a126b21c953b8dc33be86e4cf92b133b144bda94aa84e616b
curl -s -X POST \
  -H "Content-Type: application/json" \
  -d $'{
    "input": {
      "seed": 7698454872441022000,
      "width": 1024,
      "height": 1024,
      "prompt": "the woman wearing a dress, In the banquet hall",
      "ref_res": 512,
      "guidance": 3.5,
      "true_cfg": 1,
      "num_steps": 12,
      "ref_task1": "id",
      "ref_task2": "ip",
      "neg_prompt": "",
      "ref_image1": "https://replicate.delivery/pbxt/MzZo0OcsWW6NBl10nsCZoIRcRP9aGSczwDwVSLW5QRn8zD42/0_1.webp",
      "ref_image2": "https://raw.githubusercontent.com/bytedance/DreamO/main/example_inputs/dress.png",
      "cfg_end_step": 0,
      "neg_guidance": 3.5,
      "output_format": "webp",
      "cfg_start_step": 0,
      "output_quality": 90,
      "first_step_guidance": 0
    }
  }' \
  http://localhost:5000/predictions

To learn more, take a look at the Cog documentation.

Output

{
  "completed_at": "2025-05-12T17:08:08.314742Z",
  "created_at": "2025-05-12T17:06:45.209000Z",
  "data_removed": false,
  "error": null,
  "id": "wnq85a4nb5rj60cprnv8kh2bgm",
  "input": {
    "seed": 7698454872441022000,
    "width": 1024,
    "height": 1024,
    "prompt": "the woman wearing a dress, In the banquet hall",
    "ref_res": 512,
    "guidance": 3.5,
    "true_cfg": 1,
    "num_steps": 12,
    "ref_task1": "id",
    "ref_task2": "ip",
    "neg_prompt": "",
    "ref_image1": "https://replicate.delivery/pbxt/MzZo0OcsWW6NBl10nsCZoIRcRP9aGSczwDwVSLW5QRn8zD42/0_1.webp",
    "ref_image2": "https://raw.githubusercontent.com/bytedance/DreamO/main/example_inputs/dress.png",
    "cfg_end_step": 0,
    "neg_guidance": 3.5,
    "output_format": "webp",
    "cfg_start_step": 0,
    "output_quality": 90,
    "first_step_guidance": 0
  },
  "logs": "--- Prediction Start ---\nPrompt: the woman wearing a dress, In the banquet hall\nSeed: 7698454872441022000\nDimensions: 1024x1024\nSteps: 12, Guidance: 3.5\nOutput Format: webp, Quality: 90\nProcessing reference image 1 (/tmp/tmppzjoabeu0_1.webp) with task: id\nTask: ID - Aligning face...\nFace alignment complete.\nReference image 1 processed successfully.\nProcessing reference image 2 (/tmp/tmpfykiz509dress.png) with task: ip\nTask: IP - Removing background...\nBackground removal complete.\nResizing reference image 2 towards target area 262144...\nResized to shape: (544, 464, 3)\nReference image 2 processed successfully.\nUsing seed for generation: 7698454872441022000\nStarting DreamO pipeline inference...\n  0%|          | 0/12 [00:00<?, ?it/s]\n  8%|▊         | 1/12 [00:06<01:06,  6.05s/it]\n 17%|█▋        | 2/12 [00:09<00:44,  4.46s/it]\n 25%|██▌       | 3/12 [00:15<00:46,  5.19s/it]\n 33%|███▎      | 4/12 [00:21<00:44,  5.53s/it]\n 42%|████▏     | 5/12 [00:27<00:39,  5.71s/it]\n 50%|█████     | 6/12 [00:33<00:34,  5.83s/it]\n 58%|█████▊    | 7/12 [00:39<00:29,  5.90s/it]\n 67%|██████▋   | 8/12 [00:45<00:23,  5.94s/it]\n 75%|███████▌  | 9/12 [00:51<00:17,  5.98s/it]\n 83%|████████▎ | 10/12 [00:57<00:11,  6.00s/it]\n 92%|█████████▏| 11/12 [01:03<00:06,  6.01s/it]\n100%|██████████| 12/12 [01:09<00:00,  6.02s/it]\n100%|██████████| 12/12 [01:09<00:00,  5.82s/it]\nInference complete in 73.17 seconds.\nUsing output quality: 90\nOutput image saved to /tmp/output.webp\n--- Prediction End ---",
  "metrics": {
    "predict_time": 74.334400784,
    "total_time": 83.105742
  },
  "output": "https://replicate.delivery/yhqm/h78bCImZ6HrtBVM0pE0EEXW4naneIvVBkdviQJML2rC85erUA/output.webp",
  "started_at": "2025-05-12T17:06:53.980342Z",
  "status": "succeeded",
  "urls": {
    "stream": "https://stream.replicate.com/v1/files/qoxq-2x2sfnkoi2e4kaon6bufunssldbrx37fa4gii3cztehep46dataa",
    "get": "https://api.replicate.com/v1/predictions/wnq85a4nb5rj60cprnv8kh2bgm",
    "cancel": "https://api.replicate.com/v1/predictions/wnq85a4nb5rj60cprnv8kh2bgm/cancel"
  },
  "version": "8f8d84ebe012e94a126b21c953b8dc33be86e4cf92b133b144bda94aa84e616b"
}

Generated in

74.3 seconds

Tweak it ShareReport

Examples

View more examples

Run time and cost

This model costs approximately $0.33 to run on Replicate, or 3 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia A100 (80GB) GPU hardware. Predictions typically complete within 4 minutes. The predict time for this model varies significantly based on the inputs.

Readme

DreamO: Unified Image Customization 🎨 (Cog Implementation)

This Replicate model runs DreamO, a unified framework for image customization developed by Bytedance. It excels at tasks like subject-driven generation (IP-Adapter/PuLID style), virtual try-on, and style transfer, leveraging the FLUX.1-dev model as its backbone.

Original Project (GitHub): bytedance/DreamO arXiv Paper: 2504.16915: DreamO: A Unified Framework for Image Customization Core HF Weights: black-forest-labs/FLUX.1-dev (DreamO Pipeline) & PramaLLC/BEN2 (Background Removal)

About the DreamO Model

DreamO is a powerful image customization framework designed to handle a variety of conditioning inputs simultaneously. By leveraging VAE-based feature encoding and a novel feature routing constraint, DreamO can effectively mitigate conflicts and entanglement among multiple entities or style conditions. This allows for high-fidelity generation across different tasks such as character/object insertion (IP), face identity preservation (ID), virtual try-on, and style application.

Key Features & Capabilities ✨

IP (Identity Preservation - General) 🖼️: Similar to IP-Adapter, supports a wide range of inputs including characters, objects, and animals. Achieves high fidelity in preserving entity identity.
ID (Identity Preservation - Face) 👩: Focuses specifically on facial identity, similar to InstantID and PuLID.
Try-On 👚👒: Supports virtual try-on for items like tops, bottoms, glasses, and hats, even with multiple garments (a capability generalized from its training).
Style Transfer 🎨: Applies the style of a reference image to a new generation. (Note: Currently less stable than other tasks and cannot be combined with other conditions in the original implementation).
Multi-Condition Generation ➕: Can combine multiple conditions (e.g., ID + IP, multiple IPs) to generate more creative and complex images, effectively managing potential conflicts between conditions.

Underlying Technologies & Concepts 🔬

FLUX Backbone: Leverages the powerful FLUX.1-dev text-to-image model. DreamO uses FLUX-turbo LoRA by default for faster inference.
VAE-based Feature Encoding: Utilized for encoding reference images to capture high-fidelity details.
Feature Routing Constraint: A key proposal in the DreamO paper to mitigate conflicts and entanglement when multiple conditions are applied.

Use Cases 💡

Creating personalized avatars or character portraits with specific facial identities.
Generating images of objects or characters in new scenes or styles.
Virtually trying on clothing or accessories.
Applying artistic styles from one image to another.
Combining multiple reference subjects or styles into a single cohesive image.

Limitations ⚠️

Style Task Stability: As noted in the original repository, style consistency is currently less stable compared to other tasks, and in the current version, style cannot be combined with other conditions.
ID Task Nuances: While DreamO achieves high facial fidelity for ID tasks, the original paper notes it may introduce more model contamination compared to SOTA approaches like PuLID. Lowering guidance can sometimes help with “glossy” faces.
Resource Intensive: Requires a capable GPU (Nvidia A100 80GB on Replicate).

License & Disclaimer 📜

The original DreamO project is licensed under the Apache-2.0 License. See the LICENSE file in the original repository.

Disclaimer (from bytedance/DreamO): This project strives to impact the domain of AI-driven image generation positively. Users are granted the freedom to create images using this tool, but they are expected to comply with local laws and utilize it responsibly. The developers do not assume any responsibility for potential misuse by users.

This Replicate endpoint is provided for experimentation based on the original work. Users must adhere to the original license and disclaimer.

Citation 📚

If you find DreamO useful for your research, please consider citing their paper:

@misc{wu2025dreamo,
      title={DreamO: A Unified Framework for Image Customization}, 
      author={Yanze Wu and Yutong Feng and Difan Liu and Jiarui Sabir IARIVOAHY and Zicheng Liu and Qiang Wen and Yuedong Yang and Ming-Hsuan Yang and Chong Mou},
      year={2025},
      eprint={2504.16915},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Cog implementation managed by zsxkib.

Star the original repo on GitHub: bytedance/DreamO ⭐

Follow me on Twitter/X