cjwbw / stable-diffusion-high-resolution

Detailed, higher-resolution images from Stable Diffusion

Demo API Examples Versions (231e401d)

Examples

View more examples

Run time and cost

Predictions run on Nvidia A100 (40GB) GPU hardware. Predictions typically complete within 132 seconds. The predict time for this model varies significantly based on the inputs.

This is a Cog implementation of Detailed, higher-resolution images from Stable-Diffusion, originally implemented by @jquesnelle at https://github.com/jquesnelle/txt2imghd/blob/master/txt2imghd.py Safety checker is added additionally.

txt2imghd is a port of the GOBIG mode from progrockdiffusion applied to Stable Diffusion, with Real-ESRGAN as the upscaler. It creates detailed, higher-resolution images by first generating an image from a prompt, upscaling it, and then running img2img on smaller pieces of the upscaled image, and blending the result back into the original image.

txt2imghd with default settings has the same VRAM requirements as regular Stable Diffusion, although generation of the detailed images will take longer.