andreasjansson / stable-diffusion-inpainting

Inpainting using RunwayML's stable-diffusion-inpainting checkpoint

Demo API Examples Versions (e490d072)

Examples

View more examples

Run time and cost

Predictions run on Nvidia A100 (40GB) GPU hardware. Predictions typically complete within 10 seconds.

Stable Diffusion Inpainting

Checkpoint: https://huggingface.co/runwayml/stable-diffusion-v1-5

Tip: Get a high-quality image mask by using https://replicate.com/arielreplicate/dichotomous_image_segmentation

Stable Diffusion Inpainting is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input, with the extra capability of inpainting the pictures by using a mask.

The Stable-Diffusion-Inpainting was initialized with the weights of the Stable-Diffusion-v-1-2. First 595k steps regular training, then 440k steps of inpainting training at resolution 512x512 on “laion-aesthetics v2 5+” and 10% dropping of the text-conditioning to improve classifier-free classifier-free guidance sampling. For inpainting, the UNet has 5 additional input channels (4 for the encoded masked-image and 1 for the mask itself) whose weights were zero-initialized after restoring the non-inpainting checkpoint. During training, we generate synthetic masks and in 25% mask everything.