jagilley / stable-diffusion-depth2img

Create variations of an image while preserving shape and depth

  • Public
  • 57.4K runs
  • A100 (80GB)
  • GitHub
  • License

Input

string
Shift + Return to add a new line

The prompt to guide the image generation.

Default: "Wanderer above the sea of fog, digital art"

string
Shift + Return to add a new line

Keywords to exclude from the resulting image

*file
Preview
input_image

Input image to be used as the starting point

number

Prompt strength when providing the image. 1.0 corresponds to full destruction of information in init image.

Default: 0.8

integer
(minimum: 1, maximum: 8)

Number of images to generate

Default: 1

integer
(minimum: 1, maximum: 500)

The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference.

Default: 50

number
(minimum: 1, maximum: 20)

Scale for classifier-free guidance. Higher guidance scale encourages to generate images that are closely linked to the text prompt, usually at the expense of lower image quality.

Default: 7.5

string

Choose a scheduler

Default: "DPMSolverMultistep"

integer

Random seed. Leave blank to randomize the seed

file

Depth image (optional). Specifies the depth of each pixel in the input image.

Output

output
Generated in

This example was created by a different version, jagilley/stable-diffusion-depth2img:d488378e.

Run time and cost

This model costs approximately $0.054 to run on Replicate, or 18 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia A100 (80GB) GPU hardware. Predictions typically complete within 39 seconds. The predict time for this model varies significantly based on the inputs.

Readme

Create variations of an image while preserving shape and depth.

This stable-diffusion-2-depth model is resumed from stable-diffusion-2-base (512-base-ema.ckpt) and finetuned for 200k steps. Added an extra input channel to process the (relative) depth prediction produced by MiDaS (dpt_hybrid) which is used as an additional conditioning.

  • Developed by: Robin Rombach, Patrick Esser
  • Model type: Diffusion-based text-to-image generation model
  • Language(s): English
  • License: CreativeML Open RAIL++-M License
  • Model Description: This is a model that can be used to generate and modify images based on text prompts. It is a Latent Diffusion Model that uses a fixed, pretrained text encoder (OpenCLIP-ViT/H).
  • Resources for more information: GitHub Repository.

Intended use

See stability-ai/stable-diffusion for direct use, misuse, malicious use, out-of-scope use, limitations, and bias.