jagilley/controlnet-seg | Run with an API on Replicate

Input

image

*file

Input image

prompt

*string

Shift + Return to add a new line

Prompt for the model

num_samples

string

Number of samples (higher values may OOM)

Default: "1"

image_resolution

string

Image resolution to be generated

Default: "512"

ddim_steps

integer

Steps

Default: 20

scale

number

(minimum: 0.1, maximum: 30)

Guidance Scale

Default: 9

seed

integer

Seed

eta

number

eta (DDIM)

Default: 0

a_prompt

string

Shift + Return to add a new line

Added Prompt

Default: "best quality, extremely detailed"

n_prompt

string

Shift + Return to add a new line

longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low qualitylongbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality

Negative Prompt

Default: "longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality"

detect_resolution

integer

(minimum: 128, maximum: 1024)

Resolution for detection (only applicable when model type is 'HED' or 'Segmentation')

Default: 512

Run this model in Node.js with one line of code:

npx create-replicate --model=jagilley/controlnet-seg

or set up a project from scratch

Install Replicate’s Node.js client library:

npm install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import and set up the client:

import Replicate from "replicate";
import fs from "node:fs";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run jagilley/controlnet-seg using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "jagilley/controlnet-seg:f967b165f4cd2e151d11e7450a8214e5d22ad2007f042f2f891ca3981dbfba0d",
  {
    input: {
      eta: 0,
      image: "https://replicate.delivery/pbxt/IJYtXSDZ6sxDVWj3tcrf4JvNHT4f9LH5BAQhVSjJWf9BU3v4/house.png",
      scale: 9,
      prompt: "A modernist house in a nice landscape",
      a_prompt: "best quality, extremely detailed",
      n_prompt: "longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality",
      ddim_steps: 20,
      num_samples: "1",
      image_resolution: "512",
      detect_resolution: 512
    }
  }
);

// To access the file URL:
console.log(output[0].url()); //=> "http://example.com"

// To write the file to disk:
fs.writeFile("my-image.png", output[0]);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import the client:

import replicate

Run jagilley/controlnet-seg using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "jagilley/controlnet-seg:f967b165f4cd2e151d11e7450a8214e5d22ad2007f042f2f891ca3981dbfba0d",
    input={
        "eta": 0,
        "image": "https://replicate.delivery/pbxt/IJYtXSDZ6sxDVWj3tcrf4JvNHT4f9LH5BAQhVSjJWf9BU3v4/house.png",
        "scale": 9,
        "prompt": "A modernist house in a nice landscape",
        "a_prompt": "best quality, extremely detailed",
        "n_prompt": "longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality",
        "ddim_steps": 20,
        "num_samples": "1",
        "image_resolution": "512",
        "detect_resolution": 512
    }
)

# To access the file URL:
print(output[0].url())
#=> "http://example.com"

# To write the file to disk:
with open("my-image.png", "wb") as file:
    file.write(output[0].read())

To learn more, take a look at the guide on getting started with Python.

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Run jagilley/controlnet-seg using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "jagilley/controlnet-seg:f967b165f4cd2e151d11e7450a8214e5d22ad2007f042f2f891ca3981dbfba0d",
    "input": {
      "eta": 0,
      "image": "https://replicate.delivery/pbxt/IJYtXSDZ6sxDVWj3tcrf4JvNHT4f9LH5BAQhVSjJWf9BU3v4/house.png",
      "scale": 9,
      "prompt": "A modernist house in a nice landscape",
      "a_prompt": "best quality, extremely detailed",
      "n_prompt": "longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality",
      "ddim_steps": 20,
      "num_samples": "1",
      "image_resolution": "512",
      "detect_resolution": 512
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

Output

{
  "completed_at": "2023-02-14T20:53:33.321081Z",
  "created_at": "2023-02-14T20:46:52.659530Z",
  "data_removed": false,
  "error": null,
  "id": "ywxddjn4xbbp5i73roshr6bbyy",
  "input": {
    "image": "https://replicate.delivery/pbxt/IJYtXSDZ6sxDVWj3tcrf4JvNHT4f9LH5BAQhVSjJWf9BU3v4/house.png",
    "scale": 9,
    "prompt": "A modernist house in a nice landscape",
    "a_prompt": "best quality, extremely detailed",
    "n_prompt": "longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality",
    "ddim_steps": 20,
    "num_samples": "1",
    "image_resolution": "512",
    "detect_resolution": 512
  },
  "logs": "/src/annotator/uniformer/mmseg/models/segmentors/base.py:271: UserWarning: show==False and out_file is not specified, only result image will be returned\nwarnings.warn('show==False and out_file is not specified, only '\nGlobal seed set to 88655\nData shape for DDIM sampling is (1, 4, 64, 64), eta 0.0\nRunning DDIM Sampling with 20 timesteps\nDDIM Sampler:   0%|          | 0/20 [00:00<?, ?it/s]\nDDIM Sampler:   5%|▌         | 1/20 [00:00<00:17,  1.08it/s]\nDDIM Sampler:  10%|█         | 2/20 [00:01<00:14,  1.24it/s]\nDDIM Sampler:  15%|█▌        | 3/20 [00:02<00:13,  1.30it/s]\nDDIM Sampler:  20%|██        | 4/20 [00:03<00:12,  1.33it/s]\nDDIM Sampler:  25%|██▌       | 5/20 [00:03<00:11,  1.35it/s]\nDDIM Sampler:  30%|███       | 6/20 [00:04<00:10,  1.36it/s]\nDDIM Sampler:  35%|███▌      | 7/20 [00:05<00:09,  1.36it/s]\nDDIM Sampler:  40%|████      | 8/20 [00:06<00:08,  1.37it/s]\nDDIM Sampler:  45%|████▌     | 9/20 [00:06<00:08,  1.37it/s]\nDDIM Sampler:  50%|█████     | 10/20 [00:07<00:07,  1.37it/s]\nDDIM Sampler:  55%|█████▌    | 11/20 [00:08<00:06,  1.37it/s]\nDDIM Sampler:  60%|██████    | 12/20 [00:08<00:05,  1.37it/s]\nDDIM Sampler:  65%|██████▌   | 13/20 [00:09<00:05,  1.37it/s]\nDDIM Sampler:  70%|███████   | 14/20 [00:10<00:04,  1.37it/s]\nDDIM Sampler:  75%|███████▌  | 15/20 [00:11<00:03,  1.37it/s]\nDDIM Sampler:  80%|████████  | 16/20 [00:11<00:02,  1.37it/s]\nDDIM Sampler:  85%|████████▌ | 17/20 [00:12<00:02,  1.36it/s]\nDDIM Sampler:  90%|█████████ | 18/20 [00:13<00:01,  1.36it/s]\nDDIM Sampler:  95%|█████████▌| 19/20 [00:14<00:00,  1.36it/s]\nDDIM Sampler: 100%|██████████| 20/20 [00:14<00:00,  1.36it/s]\nDDIM Sampler: 100%|██████████| 20/20 [00:14<00:00,  1.35it/s]",
  "metrics": {
    "predict_time": 21.184482,
    "total_time": 400.661551
  },
  "output": [
    "https://replicate.delivery/pbxt/mBNpEeiNkrSNBypupIfKvNLrQcUaBYfh6of4wahaHj9ysp5BB/output_0.png",
    "https://replicate.delivery/pbxt/YKVetNkuroyWJC4mxzrgvQbJ2vS5eFOS1B8xkJMkAiXMbaegA/output_1.png"
  ],
  "started_at": "2023-02-14T20:53:12.136599Z",
  "status": "succeeded",
  "urls": {
    "get": "https://api.replicate.com/v1/predictions/ywxddjn4xbbp5i73roshr6bbyy",
    "cancel": "https://api.replicate.com/v1/predictions/ywxddjn4xbbp5i73roshr6bbyy/cancel"
  },
  "version": "f967b165f4cd2e151d11e7450a8214e5d22ad2007f042f2f891ca3981dbfba0d"
}

Generated in

21.2 seconds

Tweak itReport View full prediction

Examples

View more examples

Run time and cost

This model costs approximately $0.0085 to run on Replicate, or 117 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia A100 (80GB) GPU hardware. Predictions typically complete within 7 seconds.

Readme

Model by Lyumin Zhang

Usage

Input an image, and prompt the model to generate an image as you would for Stable Diffusion. Then. a model called Uniformer will detect the segmentations for you to control your output image.

Model Description

This model is ControlNet adapting Stable Diffusion to use a semantic segmentation of an input image in addition to a text input to generate an output image. The segmentation model will first segment the input image into different semantic regions, and then use those regions as conditioning input when generating a new image. This model was trained with the ADE20K dataset captioned by BLIP to obtain 164K segmentation-image-caption pairs. The model is trained with 200 GPU-hours on Nvidia A100 80G. The base model is Stable Diffusion 1.5.

ControlNet is a neural network structure which allows control of pretrained large diffusion models to support additional input conditions beyond prompts. The ControlNet learns task-specific conditions in an end-to-end way, and the learning is robust even when the training dataset is small (< 50k samples). Moreover, training a ControlNet is as fast as fine-tuning a diffusion model, and the model can be trained on a personal device. Alternatively, if powerful computation clusters are available, the model can scale to large amounts of training data (millions to billions of rows). Large diffusion models like Stable Diffusion can be augmented with ControlNets to enable conditional inputs like edge maps, segmentation maps, keypoints, etc.

Original model & code on GitHub

Other ControlNets

There are many different ways to use a ControlNet to modify the output of Stable Diffusion. Here are a few different options, all of which use an input image in addition to a prompt to generate an output. The methods process the input in different ways; try them out to see which works best for a given application.

ControlNet for generating images from drawings Scribble: https://replicate.com/jagilley/controlnet-scribble

ControlNets for generating humans based on input image Human Pose Detection: https://replicate.com/jagilley/controlnet-pose

ControlNets for preserving general qualities about an input image Edge detection: https://replicate.com/jagilley/controlnet-canny HED maps: https://replicate.com/jagilley/controlnet-hed Depth map: https://replicate.com/jagilley/controlnet-depth2img Hough line detection: https://replicate.com/jagilley/controlnet-hough Normal map: https://replicate.com/jagilley/controlnet-normal

Citation

@misc{https://doi.org/10.48550/arxiv.2302.05543,
  doi = {10.48550/ARXIV.2302.05543},
  url = {https://arxiv.org/abs/2302.05543},
  author = {Zhang, Lvmin and Agrawala, Maneesh},
  keywords = {Computer Vision and Pattern Recognition (cs.CV), Artificial Intelligence (cs.AI), Graphics (cs.GR), Human-Computer Interaction (cs.HC), Multimedia (cs.MM), FOS: Computer and information sciences, FOS: Computer and information sciences},
  title = {Adding Conditional Control to Text-to-Image Diffusion Models},
  publisher = {arXiv},
  year = {2023},
  copyright = {arXiv.org perpetual, non-exclusive license}
}