zylim0702 / controlnet-v1-1-multi

clip interrogator with controlnet sdxl for canny and controlnet v1.1 for the others

Cold

Public
2K runs
L40S

Iterate in playground

Run with an API

Playground API Examples README Versions

Input

image

*file

Input image

prompt

string

Shift + Return to add a new line

a dog in a bright sunshine jungle, hard lightinga dog in a bright sunshine jungle, hard lighting

Prompt for the model

autogenerated_prompt

boolean

Auto Generate Prompt for image

Default: false

structure

string

Structure to condition on

Default: "canny"

num_samples

integer

Number of samples (higher values may OOM)

Default: 1

ddim_steps

integer

Steps

Default: 20

strength

number

Control strength

Default: 1

scale

number

(minimum: 0.1, maximum: 30)

Scale for classifier-free guidance

Default: 9

seed

integer

Seed

eta

number

Controls the amount of noise that is added to the input data during the denoising diffusion process. Higher value -> more noise

Default: 0

preprocessor_resolution

integer

Preprocessor resolution

Default: 512

a_prompt

string

Shift + Return to add a new line

Additional text to be appended to prompt

Default: "Best quality, extremely detailed"

n_prompt

string

Shift + Return to add a new line

Longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low qualityLongbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality

Negative prompt

Default: "Longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality"

low_threshold

integer

(minimum: 1, maximum: 255)

[canny only] Line detection low threshold

Default: 100

high_threshold

integer

(minimum: 1, maximum: 255)

[canny only] Line detection high threshold

Default: 200

image_upscaler

boolean

Enable image Upscale

Default: false

Run this model in Node.js with one line of code:

npx create-replicate --model=zylim0702/controlnet-v1-1-multi

or set up a project from scratch

Install Replicate’s Node.js client library:

npm install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import and set up the client:

import Replicate from "replicate";
import fs from "node:fs";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run zylim0702/controlnet-v1-1-multi using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "zylim0702/controlnet-v1-1-multi:211486c3a33e26c7513c3ae4db00621f155bff401d3a241e260995e04bbbd88a",
  {
    input: {
      eta: 0,
      image: "https://replicate.delivery/pbxt/JREI44b9KCW78ynS9sH9je7wCckmEHcSF3EXwJBlhDhbh0jH/dog.png",
      scale: 9,
      prompt: "a dog in a bright sunshine jungle, hard lighting",
      a_prompt: "Best quality, extremely detailed",
      n_prompt: "Longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality",
      strength: 1,
      structure: "canny",
      ddim_steps: 20,
      num_samples: 1,
      low_threshold: 100,
      high_threshold: 200,
      image_upscaler: false,
      autogenerated_prompt: true,
      preprocessor_resolution: 512
    }
  }
);

// To access the file URL:
console.log(output[0].url()); //=> "http://example.com"

// To write the file to disk:
fs.writeFile("my-image.png", output[0]);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import the client:

import replicate

Run zylim0702/controlnet-v1-1-multi using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "zylim0702/controlnet-v1-1-multi:211486c3a33e26c7513c3ae4db00621f155bff401d3a241e260995e04bbbd88a",
    input={
        "eta": 0,
        "image": "https://replicate.delivery/pbxt/JREI44b9KCW78ynS9sH9je7wCckmEHcSF3EXwJBlhDhbh0jH/dog.png",
        "scale": 9,
        "prompt": "a dog in a bright sunshine jungle, hard lighting",
        "a_prompt": "Best quality, extremely detailed",
        "n_prompt": "Longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality",
        "strength": 1,
        "structure": "canny",
        "ddim_steps": 20,
        "num_samples": 1,
        "low_threshold": 100,
        "high_threshold": 200,
        "image_upscaler": False,
        "autogenerated_prompt": True,
        "preprocessor_resolution": 512
    }
)

# To access the file URL:
print(output[0].url())
#=> "http://example.com"

# To write the file to disk:
with open("my-image.png", "wb") as file:
    file.write(output[0].read())

To learn more, take a look at the guide on getting started with Python.

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Run zylim0702/controlnet-v1-1-multi using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "zylim0702/controlnet-v1-1-multi:211486c3a33e26c7513c3ae4db00621f155bff401d3a241e260995e04bbbd88a",
    "input": {
      "eta": 0,
      "image": "https://replicate.delivery/pbxt/JREI44b9KCW78ynS9sH9je7wCckmEHcSF3EXwJBlhDhbh0jH/dog.png",
      "scale": 9,
      "prompt": "a dog in a bright sunshine jungle, hard lighting",
      "a_prompt": "Best quality, extremely detailed",
      "n_prompt": "Longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality",
      "strength": 1,
      "structure": "canny",
      "ddim_steps": 20,
      "num_samples": 1,
      "low_threshold": 100,
      "high_threshold": 200,
      "image_upscaler": false,
      "autogenerated_prompt": true,
      "preprocessor_resolution": 512
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

Output

{
  "completed_at": "2023-08-31T01:13:17.775722Z",
  "created_at": "2023-08-31T01:13:09.043216Z",
  "data_removed": false,
  "error": null,
  "id": "tntmvudbhpfncycnoocai6z7xa",
  "input": {
    "image": "https://replicate.delivery/pbxt/JREI44b9KCW78ynS9sH9je7wCckmEHcSF3EXwJBlhDhbh0jH/dog.png",
    "scale": 9,
    "prompt": "a dog in a bright sunshine jungle, hard lighting",
    "a_prompt": "Best quality, extremely detailed",
    "n_prompt": "Longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality",
    "strength": 1,
    "structure": "canny",
    "ddim_steps": 20,
    "num_samples": 1,
    "low_threshold": 100,
    "high_threshold": 200,
    "image_upscaler": false,
    "autogenerated_prompt": true,
    "preprocessor_resolution": 512
  },
  "logs": "0%|          | 0/55 [00:00<?, ?it/s]\n 56%|█████▋    | 31/55 [00:00<00:00, 304.70it/s]\n100%|██████████| 55/55 [00:00<00:00, 309.42it/s]\na dog in a bright sunshine jungle, hard lighting, there is a dog sitting on a bench in a field, sitting on a bench, sitting on bench, sit on a bench, sitting on a park bench, happy dog, portrait, benches, portrait shot, award - winning pet photography, portrait image, sittin, a wooden, four legged, medium portrait, on a sunny day, at a park, peaceful mood\nUsing seed: 25123\nThe following part of your input was truncated because CLIP can only handle sequences up to 77 tokens: ['sunny day, at a park, peaceful mood']\nThe following part of your input was truncated because CLIP can only handle sequences up to 77 tokens: ['sunny day, at a park, peaceful mood']\n  0%|          | 0/20 [00:00<?, ?it/s]\n  5%|▌         | 1/20 [00:00<00:05,  3.44it/s]\n 10%|█         | 2/20 [00:00<00:05,  3.43it/s]\n 15%|█▌        | 3/20 [00:00<00:04,  3.42it/s]\n 20%|██        | 4/20 [00:01<00:04,  3.41it/s]\n 25%|██▌       | 5/20 [00:01<00:04,  3.41it/s]\n 30%|███       | 6/20 [00:01<00:04,  3.41it/s]\n 35%|███▌      | 7/20 [00:02<00:03,  3.41it/s]\n 40%|████      | 8/20 [00:02<00:03,  3.41it/s]\n 45%|████▌     | 9/20 [00:02<00:03,  3.41it/s]\n 50%|█████     | 10/20 [00:02<00:02,  3.41it/s]\n 55%|█████▌    | 11/20 [00:03<00:02,  3.40it/s]\n 60%|██████    | 12/20 [00:03<00:02,  3.40it/s]\n 65%|██████▌   | 13/20 [00:03<00:02,  3.40it/s]\n 70%|███████   | 14/20 [00:04<00:01,  3.40it/s]\n 75%|███████▌  | 15/20 [00:04<00:01,  3.40it/s]\n 80%|████████  | 16/20 [00:04<00:01,  3.40it/s]\n 85%|████████▌ | 17/20 [00:04<00:00,  3.41it/s]\n 90%|█████████ | 18/20 [00:05<00:00,  3.42it/s]\n 95%|█████████▌| 19/20 [00:05<00:00,  3.42it/s]\n100%|██████████| 20/20 [00:05<00:00,  3.42it/s]\n100%|██████████| 20/20 [00:05<00:00,  3.41it/s]",
  "metrics": {
    "predict_time": 8.817962,
    "total_time": 8.732506
  },
  "output": [
    "https://replicate.delivery/pbxt/kARY8cHjYeSXdyeeXaOsKlECvnhwD0ijD7MHYvP7X3NZZzeFB/out-0.png"
  ],
  "started_at": "2023-08-31T01:13:08.957760Z",
  "status": "succeeded",
  "urls": {
    "get": "https://api.replicate.com/v1/predictions/tntmvudbhpfncycnoocai6z7xa",
    "cancel": "https://api.replicate.com/v1/predictions/tntmvudbhpfncycnoocai6z7xa/cancel"
  },
  "version": "211486c3a33e26c7513c3ae4db00621f155bff401d3a241e260995e04bbbd88a"
}

Generated in

8.8 seconds

Tweak it Report View full prediction

0%|          | 0/55 [00:00<?, ?it/s]
 56%|█████▋    | 31/55 [00:00<00:00, 304.70it/s]
100%|██████████| 55/55 [00:00<00:00, 309.42it/s]
a dog in a bright sunshine jungle, hard lighting, there is a dog sitting on a bench in a field, sitting on a bench, sitting on bench, sit on a bench, sitting on a park bench, happy dog, portrait, benches, portrait shot, award - winning pet photography, portrait image, sittin, a wooden, four legged, medium portrait, on a sunny day, at a park, peaceful mood
Using seed: 25123
The following part of your input was truncated because CLIP can only handle sequences up to 77 tokens: ['sunny day, at a park, peaceful mood']
The following part of your input was truncated because CLIP can only handle sequences up to 77 tokens: ['sunny day, at a park, peaceful mood']
  0%|          | 0/20 [00:00<?, ?it/s]
  5%|▌         | 1/20 [00:00<00:05,  3.44it/s]
 10%|█         | 2/20 [00:00<00:05,  3.43it/s]
 15%|█▌        | 3/20 [00:00<00:04,  3.42it/s]
 20%|██        | 4/20 [00:01<00:04,  3.41it/s]
 25%|██▌       | 5/20 [00:01<00:04,  3.41it/s]
 30%|███       | 6/20 [00:01<00:04,  3.41it/s]
 35%|███▌      | 7/20 [00:02<00:03,  3.41it/s]
 40%|████      | 8/20 [00:02<00:03,  3.41it/s]
 45%|████▌     | 9/20 [00:02<00:03,  3.41it/s]
 50%|█████     | 10/20 [00:02<00:02,  3.41it/s]
 55%|█████▌    | 11/20 [00:03<00:02,  3.40it/s]
 60%|██████    | 12/20 [00:03<00:02,  3.40it/s]
 65%|██████▌   | 13/20 [00:03<00:02,  3.40it/s]
 70%|███████   | 14/20 [00:04<00:01,  3.40it/s]
 75%|███████▌  | 15/20 [00:04<00:01,  3.40it/s]
 80%|████████  | 16/20 [00:04<00:01,  3.40it/s]
 85%|████████▌ | 17/20 [00:04<00:00,  3.41it/s]
 90%|█████████ | 18/20 [00:05<00:00,  3.42it/s]
 95%|█████████▌| 19/20 [00:05<00:00,  3.42it/s]
100%|██████████| 20/20 [00:05<00:00,  3.42it/s]
100%|██████████| 20/20 [00:05<00:00,  3.41it/s]

Examples

View more examples

Run time and cost

This model costs approximately $0.015 to run on Replicate, or 66 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia L40S GPU hardware. Predictions typically complete within 16 seconds. The predict time for this model varies significantly based on the inputs.

Readme

Abstract: ControlNet stands as a pioneering artificial intelligence model that synergizes cutting-edge technology in the domains of computer vision and natural language processing. Rooted in the fundamental principles of neural network architectures, ControlNet demonstrates a unique proficiency in the realm of automatic image-to-text captioning, enhanced by its unrivaled abilities in image adaptation and upscale augmentation. This model embodies a convergence of innovation, combining the realms of image analysis and linguistic expression.

Introduction: ControlNet represents a novel AI model that redefines the landscape of image description and understanding. By harnessing the prowess of deep learning and neural networks, ControlNet endeavors to bridge the semantic gap between visual content and textual interpretation. Central to its capabilities are advanced techniques in image-to-text captioning, supported by adaptive image processing and high-quality image upscaling. The model showcases a sophisticated architecture designed to bring forth comprehensive and contextually coherent textual descriptions for a diverse range of images, catering to various sizes and dimensions.

Auto AI Image-to-Text Captioning: ControlNet’s core competence resides in its state-of-the-art automatic image-to-text captioning prowess. It is equipped with an intricate network architecture that seamlessly synthesizes visual content and linguistic constructs. This process entails the extraction of salient features from input images, which are then mapped to semantically rich textual representations. The captions generated exhibit a nuanced understanding of the visual scene, fostering a harmonious amalgamation of image content and descriptive context.

Adaptive Image Support: One of ControlNet’s distinctive attributes is its adaptability to accommodate images of varying sizes and dimensions. Irrespective of the input image’s resolution or aspect ratio, ControlNet maintains its proficiency in generating precise and contextually relevant textual captions. This adaptability is a testament to the model’s robustness, enabling it to effectively handle a multitude of image sources without compromising on descriptive quality.

AI Image Upscaler Integration: ControlNet integrates a cutting-edge AI image upscaling mechanism, contributing to its holistic image processing capabilities. Leveraging advanced algorithms, the model enhances the visual fidelity of input images by increasing their resolution while preserving key details and minimizing artifacts. This integration augments the overall image quality, thereby enhancing the accuracy and expressiveness of the generated captions.

Implications and Applications: ControlNet’s multifaceted capabilities hold profound implications across a spectrum of applications. From enriching media accessibility for visually impaired individuals to enhancing content understanding for search engines and recommendation systems, the model’s potential is far-reaching. Additionally, it serves as a valuable tool for content creators, enabling them to automate the process of generating engaging and contextually apt image descriptions.

Conclusion: In the evolving landscape of AI-driven technologies, ControlNet emerges as a trailblazing model that converges the realms of computer vision and natural language processing. Its prowess in auto AI image-to-text captioning, adaptive image support, and AI image upscaling reflects a harmonious fusion of innovation. ControlNet stands as a testament to the power of AI to unravel the intricate relationship between visual stimuli and textual comprehension, offering a myriad of applications across various domains.

Below is ControlNet 1.0

Official implementation of Adding Conditional Control to Text-to-Image Diffusion Models.

ControlNet is a neural network structure to control diffusion models by adding extra conditions.

It copys the weights of neural network blocks into a “locked” copy and a “trainable” copy.

The “trainable” one learns your condition. The “locked” one preserves your model.

Thanks to this, training with small dataset of image pairs will not destroy the production-ready diffusion models.

The “zero convolution” is 1×1 convolution with both weight and bias initialized as zeros.

Before training, all zero convolutions output zeros, and ControlNet will not cause any distortion.

No layer is trained from scratch. You are still fine-tuning. Your original model is safe.

This allows training on small-scale or even personal devices.

This is also friendly to merge/replacement/offsetting of models/weights/blocks/layers.