cjwbw / stable-diffusion-high-resolution

Detailed, higher-resolution images from Stable Diffusion

  • Public
  • 73K runs
  • A100 (80GB)
  • GitHub
  • License

Input

string
Shift + Return to add a new line

The prompt to render.

Default: "female cyborg assimilated by alien fungus, intricate Three-point lighting portrait, by Ching Yeh and Greg Rutkowski, detailed cyberpunk in the style of GitS 1995"

integer

Width of original stable-diffusion output image. Final output will double the width. Note that 1024x1024 may run out of memory, if so, please lower the width or height.

Default: 512

integer

Height of original stable-diffusion output image. Final output will double the height. Note that 1024x1024 may run out of memory, if so, please lower the width or height.

Default: 512

number

Unconditional guidance scale: eps = eps(x, empty) + scale * (eps(x, cond) - eps(x, empty)).

Default: 7.5

integer

Number of sampling steps.

Default: 50

integer

The seed (for reproducible sampling).

Output

output
Generated in

Run time and cost

This model costs approximately $0.054 to run on Replicate, or 18 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia A100 (80GB) GPU hardware. Predictions typically complete within 39 seconds. The predict time for this model varies significantly based on the inputs.

Readme

This is a Cog implementation of Detailed, higher-resolution images from Stable-Diffusion, originally implemented by @jquesnelle at https://github.com/jquesnelle/txt2imghd/blob/master/txt2imghd.py Safety checker is added additionally.

txt2imghd is a port of the GOBIG mode from progrockdiffusion applied to Stable Diffusion, with Real-ESRGAN as the upscaler. It creates detailed, higher-resolution images by first generating an image from a prompt, upscaling it, and then running img2img on smaller pieces of the upscaled image, and blending the result back into the original image.

txt2imghd with default settings has the same VRAM requirements as regular Stable Diffusion, although generation of the detailed images will take longer.