zsxkib / sd3-controlnet

✨Stable Diffusion 3 w/ ⚑InstantX's Canny, Pose, and Tile ControlNetsπŸ–ΌοΈ

  • Public
  • 1.2K runs
  • L40S
  • GitHub
  • Paper
  • License

Input

input_image
*file

Input image

*string
Shift + Return to add a new line

Prompt

string
Shift + Return to add a new line

Negative prompt

Default: "NSFW, nude, naked, porn, ugly"

string

Structure type

Default: "canny"

string

Aspect ratio for the generated image. Note that the model performs best at 1024x1024 resolution. Other sizes may yield suboptimal results.

Default: "1:1"

integer
(minimum: 1, maximum: 4)

Number of images to output.

Default: 1

integer
(minimum: 1, maximum: 50)

Inference steps

Default: 25

number
(minimum: 0, maximum: 50)

Guidance scale

Default: 7

number
(minimum: 0, maximum: 1)

Control weight

Default: 0.7

integer
(minimum: 1, maximum: 255)

[Canny only] Line detection low threshold

Default: 100

integer
(minimum: 1, maximum: 255)

[Canny only] Line detection high threshold

Default: 200

integer

Random seed. Leave blank to randomize the seed

string

Format of the output images

Default: "webp"

integer
(minimum: 0, maximum: 100)

Quality of the output images, from 0 to 100. 100 is best quality, 0 is lowest quality.

Default: 80

boolean

This model’s safety checker can’t be disabled when running on the website. Learn more about platform safety on Replicate.

Disable safety checker for generated images. This feature is only available through the API. See [https://replicate.com/docs/how-does-replicate-work#safety](https://replicate.com/docs/how-does-replicate-work#safety)

Default: false

Output

output
Generated in

Run time and cost

This model costs approximately $0.035 to run on Replicate, or 28 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia L40S GPU hardware. Predictions typically complete within 36 seconds. The predict time for this model varies significantly based on the inputs.

Readme

✨Stable Diffusion 3 w/ ⚑InstantX’s Canny, Pose, and Tile ControlNetsπŸ–ΌοΈ

About

Implementation of InstantX/SD3-Controlnet-Canny, InstantX/SD3-Controlnet-Pose, and InstantX/SD3-Controlnet-Tile

Changelog

  • Added InstantX/SD3-Controlnet-Canny.
  • PNGs with alpha (transparency) channels are now converted to RGB.
  • Added InstantX/SD3-Controlnet-Pose.
  • Added InstantX/SD3-Controlnet-Tile. Implemented lazy loading for controlnets to manage GPU memory limitations, as loading all three controlnets simultaneously is not feasible.

Examples

Tile

Here are examples of outputs with different weights applied to the control image:

Control Image Weight=0.0 Weight=0.3 Weight=0.5 Weight=0.7 Weight=0.9
Control Image Weight 0.0 Weight 0.3 Weight 0.5 Weight 0.7 Weight 0.9

Pose

Control Image Weight=0.0 Weight=0.3 Weight=0.5 Weight=0.7 Weight=0.9
Control Image Weight 0.0 Weight 0.3 Weight 0.5 Weight 0.7 Weight 0.9

Canny

Control Image Weight=0.0 Weight=0.3 Weight=0.5 Weight=0.7 Weight=0.9
Control Image Weight 0.0 Weight 0.3 Weight 0.5 Weight 0.7 Weight 0.9

Limitations

Due to the fact that only 1024x1024 pixel resolution was used during the training phase, the inference performs best at this size, with other sizes yielding suboptimal results. We will initiate multi-resolution training in the future, and at that time, we will open-source the new weights.

Stable Diffusion 3 Medium is a 2 billion parameter text-to-image model developed by Stability AI. It excels at photorealism, typography, and prompt following.

Stable Diffusion 3 on Replicate can be used for commercial work.

Core Model

Architecture diagram

Stable Diffusion 3 Medium is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features greatly improved performance in image quality, typography, complex prompt understanding, and resource-efficiency.

For more technical details, please refer to the Research paper.

Safety

As part of our safety-by-design and responsible AI deployment approach, Stability AI implement safety measures throughout the development of our models, from the time we begin pre-training a model to the ongoing development, fine-tuning, and deployment of each model. We have implemented a number of safety mitigations that are intended to reduce the risk of severe harms, however we recommend that developers conduct their own testing and apply additional mitigations based on their specific use cases.

For more about our approach to Safety, please visit our Safety page.

Support

All credit goes to the InstantX team Give me a follow on Twitter if you like my work! @zsakib_