edenartlab / sdxl-lora-trainer

LoRa trainer for both SDXL and SD15

  • Public
  • 7K runs
  • L40S
  • GitHub
  • License

Input

string
Shift + Return to add a new line

Name of new LORA concept

Default: "unnamed"

*string
Shift + Return to add a new line

Training images for new LORA concept (can be image urls or an url to a .zip file of images)

string

What are you trying to learn?

Default: "style"

string

SDXL gives much better LoRa's if you just need static images. If you want to make AnimateDiff animations, train an SD15 lora.

Default: "sdxl"

integer

Number of training steps. Increasing this usually leads to overfitting, only viable if you have > 100 training imgs. For faces you may want to reduce to eg 300

Default: 300

integer

Save a checkpoint every n steps (The final checkpoint will always be saved)

Default: 10000

integer

Square pixel resolution which your images will be resized to for training, highly recommended: 512 or 768

Default: 512

number

final learning rate of unet (after warmup), increasing this usually leads to strong overfitting

Default: 0.0003

number

Learning rate for training textual inversion embeddings. Don't alter unless you know what you're doing.

Default: 0.001

integer

Rank of LoRA embeddings for the unet.

Default: 16

integer
(minimum: 1, maximum: 4)

How many new tokens to train (highly recommended to leave this at 2)

Default: 3

integer

Batch size (per device) for training (dont increase unless running on a BIG GPU)

Default: 4

integer

Number of sample images in validation grid

Default: 4

integer

Resolution of sample images in validation grid

Default: 1024

number

Scale factor for LoRa when generating sample images. If not provided, will be set automatically

integer

Random seed for reproducible training. Leave empty to use a random seed

Output

No output yet! Press "Submit" to start a prediction.

Run time and cost

This model runs on Nvidia L40S GPU hardware. We don't yet have enough runs of this model to provide performance information.

Readme

This trainer uses a single training script that is compatible with both SDXL and SD15.

The trainer has the following capabilities: - automatic image captioning using BLIP - automatic segmentation using CLIPseg - textual_inversion training of a new token to represent the concept - 3 training modes: “style” / “face” / “object” - Full finetuning or LoRa or Dora training modes are supported in the code - LoRa modules are possible for both unet and txt-encoders

The generated checkpoint files are compatible with ComfyUI and AUTO111