pwntus/stable-diffusion-depth2img | Run with an API on Replicate

pwntus / stable-diffusion-depth2img

Create variations of an image while preserving shape and depth.

Cold

Public
7.9K runs
A100 (80GB)
GitHub
Paper
License

Iterate in playground

Run with an API

Playground API Examples README Versions

Input

prompt

string

Shift + Return to add a new line

A fantasy landscape at a dark moonlit night, black sky, trending on artstationA fantasy landscape at a dark moonlit night, black sky, trending on artstation

The prompt to guide the image generation.

Default: "A fantasy landscape, trending on artstation"

negative_prompt

string

Shift + Return to add a new line

The prompt NOT to guide the image generation. Ignored when not using guidance

image

*file

Image that will be used as the starting point for the process.

prompt_strength

number

Prompt strength when providing the image. 1.0 corresponds to full destruction of information in init image.

Default: 0.8

num_outputs

integer

(minimum: 1, maximum: 8)

Number of images to output. Higher number of outputs may OOM.

Default: 1

num_inference_steps

integer

(minimum: 1, maximum: 500)

The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference.

Default: 50

guidance_scale

number

(minimum: 1, maximum: 20)

Scale for classifier-free guidance. Higher guidance scale encourages to generate images that are closely linked to the text prompt, usually at the expense of lower image quality.

Default: 7.5

scheduler

string

Choose a scheduler.

Default: "DPMSolverMultistep"

seed

integer

Random seed. Leave blank to randomize the seed

Run this model in Node.js with one line of code:

npx create-replicate --model=pwntus/stable-diffusion-depth2img

or set up a project from scratch

Install Replicate’s Node.js client library:

npm install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import and set up the client:

import Replicate from "replicate";
import fs from "node:fs";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run pwntus/stable-diffusion-depth2img using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "pwntus/stable-diffusion-depth2img:f7b8c40a2476c36633005f208aec0ca16a8e2ac2ab69e7bdf77b6aa6eb81af6b",
  {
    input: {
      image: "https://replicate.delivery/pbxt/IAh0VDQpBV4STGq3JtNAFGA09FCYini5FZHMZl39OL7PwafI/sketch-mountains-input.jpeg",
      prompt: "A fantasy landscape at a dark moonlit night, black sky, trending on artstation",
      scheduler: "K_EULER_ANCESTRAL",
      num_outputs: 1,
      guidance_scale: 5.5,
      negative_prompt: "light sky, day",
      prompt_strength: 0.9,
      num_inference_steps: 50
    }
  }
);

// To access the file URL:
console.log(output[0].url()); //=> "http://example.com"

// To write the file to disk:
fs.writeFile("my-image.png", output[0]);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import the client:

import replicate

Run pwntus/stable-diffusion-depth2img using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "pwntus/stable-diffusion-depth2img:f7b8c40a2476c36633005f208aec0ca16a8e2ac2ab69e7bdf77b6aa6eb81af6b",
    input={
        "image": "https://replicate.delivery/pbxt/IAh0VDQpBV4STGq3JtNAFGA09FCYini5FZHMZl39OL7PwafI/sketch-mountains-input.jpeg",
        "prompt": "A fantasy landscape at a dark moonlit night, black sky, trending on artstation",
        "scheduler": "K_EULER_ANCESTRAL",
        "num_outputs": 1,
        "guidance_scale": 5.5,
        "negative_prompt": "light sky, day",
        "prompt_strength": 0.9,
        "num_inference_steps": 50
    }
)

# To access the file URL:
print(output[0].url())
#=> "http://example.com"

# To write the file to disk:
with open("my-image.png", "wb") as file:
    file.write(output[0].read())

To learn more, take a look at the guide on getting started with Python.

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Run pwntus/stable-diffusion-depth2img using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "pwntus/stable-diffusion-depth2img:f7b8c40a2476c36633005f208aec0ca16a8e2ac2ab69e7bdf77b6aa6eb81af6b",
    "input": {
      "image": "https://replicate.delivery/pbxt/IAh0VDQpBV4STGq3JtNAFGA09FCYini5FZHMZl39OL7PwafI/sketch-mountains-input.jpeg",
      "prompt": "A fantasy landscape at a dark moonlit night, black sky, trending on artstation",
      "scheduler": "K_EULER_ANCESTRAL",
      "num_outputs": 1,
      "guidance_scale": 5.5,
      "negative_prompt": "light sky, day",
      "prompt_strength": 0.9,
      "num_inference_steps": 50
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

You can run this model locally using Cog. First, install Cog:

brew install cog

If you don’t have Homebrew, there are other installation options available.

Run this to download the model and run it in your local environment:

cog predict r8.im/pwntus/stable-diffusion-depth2img@sha256:f7b8c40a2476c36633005f208aec0ca16a8e2ac2ab69e7bdf77b6aa6eb81af6b \
  -i 'image="https://replicate.delivery/pbxt/IAh0VDQpBV4STGq3JtNAFGA09FCYini5FZHMZl39OL7PwafI/sketch-mountains-input.jpeg"' \
  -i 'prompt="A fantasy landscape at a dark moonlit night, black sky, trending on artstation"' \
  -i 'scheduler="K_EULER_ANCESTRAL"' \
  -i 'num_outputs=1' \
  -i 'guidance_scale=5.5' \
  -i 'negative_prompt="light sky, day"' \
  -i 'prompt_strength=0.9' \
  -i 'num_inference_steps=50'

To learn more, take a look at the Cog documentation.

Run this to download the model and run it in your local environment:

docker run -d -p 5000:5000 --gpus=all r8.im/pwntus/stable-diffusion-depth2img@sha256:f7b8c40a2476c36633005f208aec0ca16a8e2ac2ab69e7bdf77b6aa6eb81af6b
curl -s -X POST \
  -H "Content-Type: application/json" \
  -d $'{
    "input": {
      "image": "https://replicate.delivery/pbxt/IAh0VDQpBV4STGq3JtNAFGA09FCYini5FZHMZl39OL7PwafI/sketch-mountains-input.jpeg",
      "prompt": "A fantasy landscape at a dark moonlit night, black sky, trending on artstation",
      "scheduler": "K_EULER_ANCESTRAL",
      "num_outputs": 1,
      "guidance_scale": 5.5,
      "negative_prompt": "light sky, day",
      "prompt_strength": 0.9,
      "num_inference_steps": 50
    }
  }' \
  http://localhost:5000/predictions

To learn more, take a look at the Cog documentation.

Output

{
  "completed_at": "2023-01-20T21:06:26.195744Z",
  "created_at": "2023-01-20T21:06:08.365524Z",
  "data_removed": false,
  "error": null,
  "id": "kguyn6gfsnbttmmngrr3hj6kwq",
  "input": {
    "image": "https://replicate.delivery/pbxt/IAh0VDQpBV4STGq3JtNAFGA09FCYini5FZHMZl39OL7PwafI/sketch-mountains-input.jpeg",
    "prompt": "A fantasy landscape at a dark moonlit night, black sky, trending on artstation",
    "scheduler": "K_EULER_ANCESTRAL",
    "num_outputs": 1,
    "guidance_scale": "5.5",
    "negative_prompt": "light sky, day",
    "prompt_strength": 0.9,
    "num_inference_steps": 50
  },
  "logs": "Using seed: 19973\n  0%|          | 0/45 [00:00<?, ?it/s]\n  2%|▏         | 1/45 [00:00<00:15,  2.82it/s]\n  4%|▍         | 2/45 [00:00<00:15,  2.81it/s]\n  7%|▋         | 3/45 [00:01<00:15,  2.77it/s]\n  9%|▉         | 4/45 [00:01<00:14,  2.78it/s]\n 11%|█         | 5/45 [00:01<00:14,  2.78it/s]\n 13%|█▎        | 6/45 [00:02<00:14,  2.78it/s]\n 16%|█▌        | 7/45 [00:02<00:13,  2.78it/s]\n 18%|█▊        | 8/45 [00:02<00:13,  2.77it/s]\n 20%|██        | 9/45 [00:03<00:13,  2.77it/s]\n 22%|██▏       | 10/45 [00:03<00:12,  2.77it/s]\n 24%|██▍       | 11/45 [00:03<00:12,  2.77it/s]\n 27%|██▋       | 12/45 [00:04<00:11,  2.77it/s]\n 29%|██▉       | 13/45 [00:04<00:11,  2.78it/s]\n 31%|███       | 14/45 [00:05<00:11,  2.77it/s]\n 33%|███▎      | 15/45 [00:05<00:10,  2.76it/s]\n 36%|███▌      | 16/45 [00:05<00:10,  2.77it/s]\n 38%|███▊      | 17/45 [00:06<00:10,  2.76it/s]\n 40%|████      | 18/45 [00:06<00:09,  2.77it/s]\n 42%|████▏     | 19/45 [00:06<00:09,  2.77it/s]\n 44%|████▍     | 20/45 [00:07<00:09,  2.75it/s]\n 47%|████▋     | 21/45 [00:07<00:08,  2.76it/s]\n 49%|████▉     | 22/45 [00:07<00:08,  2.75it/s]\n 51%|█████     | 23/45 [00:08<00:08,  2.75it/s]\n 53%|█████▎    | 24/45 [00:08<00:07,  2.75it/s]\n 56%|█████▌    | 25/45 [00:09<00:07,  2.75it/s]\n 58%|█████▊    | 26/45 [00:09<00:06,  2.75it/s]\n 60%|██████    | 27/45 [00:09<00:06,  2.75it/s]\n 62%|██████▏   | 28/45 [00:10<00:06,  2.75it/s]\n 64%|██████▍   | 29/45 [00:10<00:05,  2.74it/s]\n 67%|██████▋   | 30/45 [00:10<00:05,  2.74it/s]\n 69%|██████▉   | 31/45 [00:11<00:05,  2.74it/s]\n 71%|███████   | 32/45 [00:11<00:04,  2.74it/s]\n 73%|███████▎  | 33/45 [00:11<00:04,  2.74it/s]\n 76%|███████▌  | 34/45 [00:12<00:04,  2.74it/s]\n 78%|███████▊  | 35/45 [00:12<00:03,  2.74it/s]\n 80%|████████  | 36/45 [00:13<00:03,  2.74it/s]\n 82%|████████▏ | 37/45 [00:13<00:02,  2.75it/s]\n 84%|████████▍ | 38/45 [00:13<00:02,  2.75it/s]\n 87%|████████▋ | 39/45 [00:14<00:02,  2.74it/s]\n 89%|████████▉ | 40/45 [00:14<00:01,  2.74it/s]\n 91%|█████████ | 41/45 [00:14<00:01,  2.74it/s]\n 93%|█████████▎| 42/45 [00:15<00:01,  2.74it/s]\n 96%|█████████▌| 43/45 [00:15<00:00,  2.74it/s]\n 98%|█████████▊| 44/45 [00:15<00:00,  2.74it/s]\n100%|██████████| 45/45 [00:16<00:00,  2.74it/s]\n100%|██████████| 45/45 [00:16<00:00,  2.75it/s]",
  "metrics": {
    "predict_time": 17.792457,
    "total_time": 17.83022
  },
  "output": [
    "https://replicate.delivery/pbxt/o9adZ8mV3s6gGlrYV7eNnMV8OH5pwFKGESWfOHGWErcRRLWQA/out-0.png"
  ],
  "started_at": "2023-01-20T21:06:08.403287Z",
  "status": "succeeded",
  "urls": {
    "get": "https://api.replicate.com/v1/predictions/kguyn6gfsnbttmmngrr3hj6kwq",
    "cancel": "https://api.replicate.com/v1/predictions/kguyn6gfsnbttmmngrr3hj6kwq/cancel"
  },
  "version": "90a616d5e8e49d4387c8e1de57f6a62f8182a99644468a22890e8a956a45a964"
}

Generated in

17.8 seconds

Tweak itReport View full prediction

Using seed: 19973
  0%|          | 0/45 [00:00<?, ?it/s]
  2%|▏         | 1/45 [00:00<00:15,  2.82it/s]
  4%|▍         | 2/45 [00:00<00:15,  2.81it/s]
  7%|▋         | 3/45 [00:01<00:15,  2.77it/s]
  9%|▉         | 4/45 [00:01<00:14,  2.78it/s]
 11%|█         | 5/45 [00:01<00:14,  2.78it/s]
 13%|█▎        | 6/45 [00:02<00:14,  2.78it/s]
 16%|█▌        | 7/45 [00:02<00:13,  2.78it/s]
 18%|█▊        | 8/45 [00:02<00:13,  2.77it/s]
 20%|██        | 9/45 [00:03<00:13,  2.77it/s]
 22%|██▏       | 10/45 [00:03<00:12,  2.77it/s]
 24%|██▍       | 11/45 [00:03<00:12,  2.77it/s]
 27%|██▋       | 12/45 [00:04<00:11,  2.77it/s]
 29%|██▉       | 13/45 [00:04<00:11,  2.78it/s]
 31%|███       | 14/45 [00:05<00:11,  2.77it/s]
 33%|███▎      | 15/45 [00:05<00:10,  2.76it/s]
 36%|███▌      | 16/45 [00:05<00:10,  2.77it/s]
 38%|███▊      | 17/45 [00:06<00:10,  2.76it/s]
 40%|████      | 18/45 [00:06<00:09,  2.77it/s]
 42%|████▏     | 19/45 [00:06<00:09,  2.77it/s]
 44%|████▍     | 20/45 [00:07<00:09,  2.75it/s]
 47%|████▋     | 21/45 [00:07<00:08,  2.76it/s]
 49%|████▉     | 22/45 [00:07<00:08,  2.75it/s]
 51%|█████     | 23/45 [00:08<00:08,  2.75it/s]
 53%|█████▎    | 24/45 [00:08<00:07,  2.75it/s]
 56%|█████▌    | 25/45 [00:09<00:07,  2.75it/s]
 58%|█████▊    | 26/45 [00:09<00:06,  2.75it/s]
 60%|██████    | 27/45 [00:09<00:06,  2.75it/s]
 62%|██████▏   | 28/45 [00:10<00:06,  2.75it/s]
 64%|██████▍   | 29/45 [00:10<00:05,  2.74it/s]
 67%|██████▋   | 30/45 [00:10<00:05,  2.74it/s]
 69%|██████▉   | 31/45 [00:11<00:05,  2.74it/s]
 71%|███████   | 32/45 [00:11<00:04,  2.74it/s]
 73%|███████▎  | 33/45 [00:11<00:04,  2.74it/s]
 76%|███████▌  | 34/45 [00:12<00:04,  2.74it/s]
 78%|███████▊  | 35/45 [00:12<00:03,  2.74it/s]
 80%|████████  | 36/45 [00:13<00:03,  2.74it/s]
 82%|████████▏ | 37/45 [00:13<00:02,  2.75it/s]
 84%|████████▍ | 38/45 [00:13<00:02,  2.75it/s]
 87%|████████▋ | 39/45 [00:14<00:02,  2.74it/s]
 89%|████████▉ | 40/45 [00:14<00:01,  2.74it/s]
 91%|█████████ | 41/45 [00:14<00:01,  2.74it/s]
 93%|█████████▎| 42/45 [00:15<00:01,  2.74it/s]
 96%|█████████▌| 43/45 [00:15<00:00,  2.74it/s]
 98%|█████████▊| 44/45 [00:15<00:00,  2.74it/s]
100%|██████████| 45/45 [00:16<00:00,  2.74it/s]
100%|██████████| 45/45 [00:16<00:00,  2.75it/s]

This output was created using a different version of the model, pwntus/stable-diffusion-depth2img:90a616d5.

Examples

View more examples

Run time and cost

This model costs approximately $0.0095 to run on Replicate, or 105 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia A100 (80GB) GPU hardware. Predictions typically complete within 7 seconds. The predict time for this model varies significantly based on the inputs.

Readme

This stable-diffusion-depth2img model is resumed from stable-diffusion-2-base (512-base-ema.ckpt) and finetuned for 200k steps. Added an extra input channel to process the (relative) depth prediction produced by MiDaS (dpt_hybrid) which is used as an additional conditioning.

Model description

Developed by: Robin Rombach, Patrick Esser
Model type: Diffusion-based text-to-image generation model
Language(s): English
License: CreativeML Open RAIL++-M License
Model Description: This is a model that can be used to generate and modify images based on text prompts. It is a Latent Diffusion Model that uses a fixed, pretrained text encoder (OpenCLIP-ViT/H).
Resources for more information: GitHub Repository.

Intended use

See stabilityai/stable-diffusion-2-depth for direct use, misuse, malicious use, out-of-scope use, limitations, and bias.

Training

Training Data The model developers used the following dataset for training the model:

LAION-5B and subsets (details below). The training data is further filtered using LAION’s NSFW detector, with a “p_unsafe” score of 0.1 (conservative). For more details, please refer to LAION-5B’s NeurIPS 2022 paper and reviewer discussions on the topic.

Training Procedure Stable Diffusion v2 is a latent diffusion model which combines an autoencoder with a diffusion model that is trained in the latent space of the autoencoder. During training,

Images are encoded through an encoder, which turns images into latent representations. The autoencoder uses a relative downsampling factor of 8 and maps images of shape H x W x 3 to latents of shape H/f x W/f x 4
Text prompts are encoded through the OpenCLIP-ViT/H text-encoder.
The output of the text encoder is fed into the UNet backbone of the latent diffusion model via cross-attention.
The loss is a reconstruction objective between the noise that was added to the latent and the prediction made by the UNet. We also use the so-called v-objective, see https://arxiv.org/abs/2202.00512.