jagilley/controlnet

Modify images with a prompt while preserving their structure

Public
63.6K runs

Input

image
*file

Input image

*string
Shift + Return to add a new line

Prompt for the model

string

ControlNet model type to use

Default: "canny"

string

Number of samples (higher values may OOM)

Default: "1"

string

Image resolution to be generated

Default: "512"

integer

Steps

Default: 20

number
(minimum: 0.1, maximum: 30)

Scale for classifier-free guidance

Default: 9

integer

Seed

number

Controls the amount of noise that is added to the input data during the denoising diffusion process. Higher value -> more noise

Default: 0

string
Shift + Return to add a new line

Additional text to be appended to prompt

Default: "best quality, extremely detailed"

string
Shift + Return to add a new line

Negative Prompt

Default: "longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality"

integer
(minimum: 128, maximum: 1024)

Resolution at which detection method will be applied)

Default: 512

integer
(minimum: 1, maximum: 255)

Canny line detection low threshold (only applicable when model type is 'canny')

Default: 100

integer
(minimum: 1, maximum: 255)

Canny line detection high threshold (only applicable when model type is 'canny')

Default: 200

number
(minimum: 0, maximum: 1)

Background Threshold (only applicable when model type is 'normal')

Default: 0

number
(minimum: 0.01, maximum: 2)

Value Threshold (only applicable when model type is 'MLSD')

Default: 0.1

number
(minimum: 0.01, maximum: 20)

Distance Threshold (only applicable when model type is 'MLSD')

Default: 0.1

Output

outputoutput
Generated in

Run time and cost

This model costs approximately $0.092 to run on Replicate, or 10 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia A100 (80GB) GPU hardware. Predictions typically complete within 66 seconds. The predict time for this model varies significantly based on the inputs.

Readme

Model by Lyumin Zhang

Usage

Input an image, and prompt the model to generate an image as you would for Stable Diffusion.

Detail detection methods

Use one of eight different methods for detecting the details in the original image: - Canny edge detection: automatically detect edges in the image using adjustable thresholds - Depth detection: automatically detect the depths within the image, then diffuse based on the detected depths - HED: detect edges in the image more softly than with the ‘canny’ method - Normal maps: automatically detect the geometry of the input image, then diffuse based on the original geometry - Scribble: use a user-drawn scribble image as a basis for the final image - Seg: apply semantic segmentation to the input image, then diffuse with respect to the resulting partition - Openpose: detect the pose of any humans in the image, then generate an image with a human in the same pose

Model description

ControlNet is a neural network structure which allows control of pretrained large diffusion models to support additional input conditions beyond prompts. The ControlNet learns task-specific conditions in an end-to-end way, and the learning is robust even when the training dataset is small (< 50k samples). Moreover, training a ControlNet is as fast as fine-tuning a diffusion model, and the model can be trained on a personal device. Alternatively, if powerful computation clusters are available, the model can scale to large amounts of training data (millions to billions of rows). Large diffusion models like Stable Diffusion can be augmented with ControlNets to enable conditional inputs like edge maps, segmentation maps, keypoints, etc.

Original model & code on GitHub

Other ControlNet Models

This is a general ControlNet model which allows you to select any of the eight detail detection methods. However, you can also use a model which is specific to one of the particular methods. For applications where you expect to call the model a large number of times with an API, these may perform better.

ControlNet for generating images from drawings Scribble: https://replicate.com/jagilley/controlnet-scribble

ControlNets for generating humans based on input image Human Pose Detection: https://replicate.com/jagilley/controlnet-pose

ControlNets for preserving general qualities about an input image Edge detection: https://replicate.com/jagilley/controlnet-canny HED maps: https://replicate.com/jagilley/controlnet-hed Depth map: https://replicate.com/jagilley/controlnet-depth2img Hough line detection: https://replicate.com/jagilley/controlnet-hough Normal map: https://replicate.com/jagilley/controlnet-normal

Citation

@misc{https://doi.org/10.48550/arxiv.2302.05543,
  doi = {10.48550/ARXIV.2302.05543},
  url = {https://arxiv.org/abs/2302.05543},
  author = {Zhang, Lvmin and Agrawala, Maneesh},
  keywords = {Computer Vision and Pattern Recognition (cs.CV), Artificial Intelligence (cs.AI), Graphics (cs.GR), Human-Computer Interaction (cs.HC), Multimedia (cs.MM), FOS: Computer and information sciences, FOS: Computer and information sciences},
  title = {Adding Conditional Control to Text-to-Image Diffusion Models},
  publisher = {arXiv},
  year = {2023},
  copyright = {arXiv.org perpetual, non-exclusive license}
}
flux-canny-pro

black-forest-labs/flux-canny-pro

Generate images from a text prompt guided by Canny edge maps to preserve structure and composition. Accept a control image (photo, sketch, or render) and a prompt, then use edge-guided conditioning to retexture subjects, perform controlled style transfer, and turn sketches into detailed art. Useful for architectural visualization and other cases requiring precise layout fidelity. Outputs an image.

346.9k runs
Official
controlnet

rossjillian/controlnet

Generate images from a text prompt while matching the structure of an input image using ControlNet for Stable Diffusion. Condition on canny edges, HED edges, depth (MiDaS), normals, Hough/MLSD lines, human pose keypoints, scribble drawings, or semantic segmentation to preserve layout and geometry while changing style or content. Inputs: image, prompt, structure type; Output: image(s).

7.6m runs
sdxl-controlnet

lucataco/sdxl-controlnet

Generate images from an input image and a text prompt, guided by SDXL ControlNet (Canny). Leverage edge maps from the source image to preserve layout and composition while reimagining style and details to match the prompt. Tune conditioning strength to balance adherence to edges vs. creative variation. Handles large inputs and auto-resizes to SDXL-friendly ratios while keeping the original aspect ratio. Inputs: image and text prompt. Output: image.

3.2m runs
flux-canny-dev

black-forest-labs/flux-canny-dev

Generate images from a text prompt guided by a sketch or edge map using Canny edge detection. Accepts a control image (sketch, line drawing, or photo) plus a prompt and outputs images that follow the input edges and composition while allowing style and detail changes. Useful for turning drafts into finished artwork, layout-preserving iterations, and reimagining scenes while maintaining structure. Open‑weight edge‑guided image generation for precise composition control.

141.8k runs
Official
controlnet-1.1-x-realistic-vision-v2.0

usamaehsan/controlnet-1.1-x-realistic-vision-v2.0

Generate photorealistic images from an input image and a text prompt while preserving the input’s line art and composition. Use ControlNet 1.1 Lineart with Realistic Vision to turn sketches or outlines into detailed images, or to guide image-to-image synthesis using edges extracted from photos. Adjust control strength and inference steps to balance structural fidelity with prompt-driven changes. Outputs an image.

5.6m runs
flux-dev-controlnet

xlabs-ai/flux-dev-controlnet

Generate images from a control image and prompt using ControlNet (canny, depth, soft edge) on Flux.1 Dev. Condition generation on edges or depth to preserve structure while guiding content with text. Choose preprocessors for depth (DepthAnything, Midas, Zoe, Zoe-DepthAnything) and soft-edge (HED, TEED, PiDiNet), or use a canny map. Adjust control strength and image-to-image strength, and optionally return the preprocessed control map for debugging. Optionally load a LoRA by URL to customize style or subject. Outputs images.

251.3k runs
controlnet-deliberate

philz1337x/controlnet-deliberate

Edit images from an input image and text prompt using ControlNet with Canny edge guidance on the Deliberate (SD 1.5) checkpoint. Preserve composition via extracted edge maps while changing style and details with the prompt. Control fidelity with adjustable ControlNet weight, Canny low/high thresholds, and detection resolution; output 1–4 images at selected resolutions.

1.1m runs
controlnet_1-1

rossjillian/controlnet_1-1

Generate images from a text prompt conditioned on the structure of an input image. Preserve and reinterpret layout, edges, depth, or pose using ControlNet 1.1 with Stable Diffusion. Supports canny edges (with low/high thresholds), depth maps (MiDaS), softedge/HED, MLSD lines, normal maps, human pose, scribble/sketch, semantic segmentation, line art, shuffle, and instruct-pix2pix for text-guided edits. Inputs: image, prompt, structure type. Output: image.

8.2k runs