chenxwh / stable-diffusion-aesthetic-gradients

Stable Diffusion with Aesthetic Gradients

  • Public
  • 353 runs
  • GitHub
  • License

Input

Output

Run time and cost

This model runs on Nvidia A100 (40GB) GPU hardware. Predictions typically complete within 4 minutes. The predict time for this model varies significantly based on the inputs.

Readme

cog implementation of https://github.com/vicgalle/stable-diffusion-aesthetic-gradients

Stable Diffusion with Aesthetic Gradients 🎨

This is the codebase for the article Personalizing Text-to-Image Generation via Aesthetic Gradients:

This work proposes aesthetic gradients, a method to personalize a CLIP-conditioned diffusion model by guiding the generative process towards custom aesthetics defined by the user from a set of images. The approach is validated with qualitative and quantitative experiments, using the recent stable diffusion model and several aesthetically-filtered datasets.

In particular, this reposiory allows the user to use the aesthetic gradients technique described in the previous paper to personalize stable diffusion.

tl;dr

With this, you don’t have to learn a lot of spells/modifiers to improve the quality of the generated image.

Usage

You can use the same arguments as with the original stable diffusion repository. The script scripts/txt2img.py has the additional arguments:

  • --aesthetic_steps: number of optimization steps when doing the personalization. For a given prompt, it is recommended to start with few steps (2 or 3), and then gradually increase it (trying 5, 10, 15, 20, etc). The greater the value, the more the resulting image will be biased towards the aesthetic embedding.
  • --aesthetic_lr: learning rate for the aesthetic gradient optimization. The default value is 0.0001. This value almost usually works well enough, so you can just only tune the previous argument.
  • --aesthetic_embedding: path to the stored pytorch tensor (.pt format) containing the aesthetic embedding. It must be of shape 1x768 (CLIP-L/14 size). See below for computing your own aesthetic embeddings.

In this repository we include all the aesthetic embeddings used in the paper. All of them are in the directory aesthetic_embeddings: * sac_8plus.pt * laion_7plus.pt * aivazovsky.pt * cloudcore.pt * gloomcore.pt * glowwave.pt

See the paper to see how they were obtained.

In addition, new aesthetic embeddings have been incorporated: * fantasy.pt: created from https://huggingface.co/datasets/ChristophSchuhmann/improved_aesthetics_6.5plus by filtering only the images with word “fantasy” in the caption. The top 2000 images by score are selected for the embedding. * flower_plant.pt: created from https://huggingface.co/datasets/ChristophSchuhmann/improved_aesthetics_6.5plus by filtering only the images with word “plant”, “flower”, “floral”, “vegetation” or “garden” in the caption. The top 2000 images by score are selected for the embedding.

Citation

If you find this is useful for your research, please cite our paper:

@article{gallego2022personalizing,
  title={Personalizing Text-to-Image Generation via Aesthetic Gradients},
  author={Gallego, Victor},
  journal={arXiv preprint arXiv:2209.12330},
  year={2022}
}