goodguy1963 / hidream-l1-full-img2img

IMG2IMG for HiDream FULL AND DEV - does creative variations

  • Public
  • 455 runs
  • A100 (80GB)
  • Weights

Input

image
file

Input image file (required)

boolean

Use Florence2 for automatic caption generation

Default: true

string
Shift + Return to add a new line

With Florence2: Prefix to add to generated caption. Without Florence2: Complete prompt to use

Default: "Photo"

string

Florence2 task to perform

Default: "prompt_gen_mixed_caption_plus"

string
Shift + Return to add a new line

Negative prompt to guide the model away from certain elements

Default: "bad ugly jpeg artifacts"

string

HiDream model variant to use

Default: "full"

integer

Random seed

Default: 42

integer
(minimum: 1, maximum: 100)

Number of inference steps

Default: 50

number
(minimum: 1, maximum: 20)

Guidance scale

Default: 5.54

number
(minimum: 0, maximum: 1)

Denoising strength

Default: 0.6

number
(minimum: 0, maximum: 20)

ModelSamplingSD3 shift parameter (0-20)

Default: 3

string

Sampling algorithm to use

Default: "uni_pc"

string

Scheduler for the sampler

Default: "normal"

Including florence2_text_input

Output

output
Generated in

This output was created using a different version of the model, goodguy1963/hidream-l1-full-img2img:bba8d853.

Run time and cost

This model costs approximately $0.12 to run on Replicate, or 8 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia A100 (80GB) GPU hardware. Predictions typically complete within 85 seconds. The predict time for this model varies significantly based on the inputs.

Readme

HiDream Img2Img ComfyUI Workflow with settings that worked for me

This workflow enables advanced image-to-image generation using the HiDream model suite and Florence-2 prompt generator, designed for use with ComfyUI and Replicate.

Overview

  • Image-to-image generation with HiDream diffusion model
  • Florence-2 for prompt generation and captioning
  • VAE encoding/decoding and advanced CLIP-based text encoding
  • Negative prompt support for artifact reduction
  • LOW VRAM MODE

Required Models & Credits

Diffusion Model

  • hidream_i1_full_fp16.safetensors
    Place in: ComfyUI/models/diffusion_models
    Download
    Thanks to HiDream.ai for the model!

For low VRAM user - GPU with less than 24GB VRAM:

Text Encoders

Place all in: ComfyUI/models/text_encoders - clip_g_hidream.safetensors
Download - clip_l_hidream.safetensors
Download - llama_3.1_8b_instruct_fp8_scaled.safetensors
Download - t5xxl_fp8_e4m3fn_scaled.safetensors
Download

VAE

  • ae.safetensors
    Place in: ComfyUI/models/vae
    Download

Florence-2 Prompt Generator (NO need to download - will be downloaded automatacally at runtime)

Usage

  1. Download all required models and place them in the correct directories as listed above.
  2. Drag the workflow image in ComfyUI
  3. Use the workflow to generate new images from your input images and prompts.

For low VRAM user - GPU with less than 24GB VRAM:

Workflow Diagram

See the full workflow structure here:
WORKFLOW-HIDREAM-IMG2IMG.png

Acknowledgements

  • HiDream.ai for the diffusion model and encoders
  • Microsoft for Florence-2
  • MiaoshouAI for the Florence-2 prompt generator implementation
  • ComfyUI team for the UI and workflow engine

(wait for HiDream-E1 for even better results)