stability-ai / stable-diffusion-inpainting
Fill in masked parts of images with Stable Diffusion
-
c11bac58203367db93a3c552bd49a25a5418458ddffb7e90dae55780765e26d6
- Version
- 22.04
- Commit
- 8820991880abea5f1a57722e6ca1625117e39312
Inpainting with Stable Diffusion 2.0. The same weights as the previous stable diffusion 2.0 checkpoint, but with AITemplate acceleration. Also supports variable width and height.
-
c2172c447eb69551b59f62fd2d61dd84054e9fb7bc8a42fbe398c2a7a072ed68
- Version
- 22.04
- Commit
- 8820991880abea5f1a57722e6ca1625117e39312
Inpainting with Stable Diffusion 1.5. Exactly same as the previous stable diffusion 1.5 checkpoint, but with AITemplate acceleration.
-
c28b92a7ecd66eee4aefcd8a94eb9e7f6c3805d5f06038165407fb5cb355ba67
-
e5a34f913de0adc560d20e002c45ad43a80031b62caacc3d84010c6b6a64870c
Inpainting with Stable Diffusion 2.0. Resumed from stable-diffusion-2-base (512-base-ema.ckpt) and trained for another 200k steps. Follows the mask-generation strategy presented in LAMA which, in combination with the latent VAE representations of the masked image, are used as an additional conditioning.
-
a234d8ccd1a8492263ab03b753196b3a796cb5e2e93eff0eb820d26626b28297
Inpainting with Stable Diffusion 1.5. Resumed from stable-diffusion-v1-5 - then 440,000 steps of inpainting training at resolution 512x512 on “laion-aesthetics v2 5+” and 10% dropping of the text-conditioning. For inpainting, the UNet has 5 additional input channels (4 for the encoded masked-image and 1 for the mask itself) whose weights were zero-initialized after restoring the non-inpainting checkpoint. During training, we generate synthetic masks and in 25% mask everything.