lucataco / train-text-to-image-lora

Huggingface Diffusers: SDv1.4/1.5/2.0/2.1 finetuner

  • Public
  • 11 runs
  • GitHub
  • License

About

This is a Cog wrapper around the Diffusers method to train a text-to-image lora. It is meant to train LoRAs for any of the following Stable Diffusion base models: SDv1.4, SDv1.5, SDv2.0, & SDv2.1

For more information see the Huggingface documentation here

Training with LoRA

Low-Rank Adaption of Large Language Models was first introduced by Microsoft in LoRA: Low-Rank Adaptation of Large Language Models by Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen.

In a nutshell, LoRA allows adapting pretrained models by adding pairs of rank-decomposition matrices to existing weights and only training those newly added weights. This has a couple of advantages:

Previous pretrained weights are kept frozen so that model is not prone to catastrophic forgetting. Rank-decomposition matrices have significantly fewer parameters than original model, which means that trained LoRA weights are easily portable. LoRA attention layers allow to control to which extent the model is adapted toward new training images via a scale parameter. cloneofsimo was the first to try out LoRA training for Stable Diffusion in the popular lora GitHub repository.

Training

Make sure to have a huggingface model and dataset. In the default populated example we have SDv1.5(runwayml/stable-diffusion-v1-5) and the Narutos dataset(lambdalabs/naruto-blip-captions

Note: Change the resolution to 768 if you are using the stable-diffusion-2 768x768 model

Note: It is quite useful to monitor the training progress by regularly generating sample images during training. Weights and Biases is a nice solution to easily see generating images during training. All you need to do is to run pip install wandb before training to automatically log images.

For this example we will directly return a tar file that has the trained LoRA embeddings. If you want to upload to Huggingface you just need to enter your hf_token.

Now you can start training!

In this example, the demo training run for SDv1.5 with the Narutos dataset at 1000 max_train_steps takes 18min on an A40