Predictions run on Nvidia A100 GPU hardware. Predictions typically complete within 18 seconds.
waifu-diffusion is a latent text-to-image diffusion model that has been conditioned on high-quality anime images through fine-tuning.
The current model has been fine-tuned with a learning rate of 5.0e-6 for 4 epochs on 56k Danbooru text-image pairs which all have an aesthetic rating greater than
The data used for fine-tuning has come from a random sample of 56k Danbooru images, which were filtered based on CLIP Aesthetic Scoring where only images with an aesthetic score greater than
6.0 were used.
Captions are Danbooru-style captions.
This model is open access and available to all, with a CreativeML OpenRAIL-M license further specifying rights and usage.
The CreativeML OpenRAIL License specifies:
This project would not have been possible without the incredible work by the CompVis Researchers.
In order to reach us, you can join our Discord server.