lucataco / ssd-lora-training

POC to train SSD-1B LoRAs for cheaper & faster training

  • Public
  • 249 runs
  • L40S
  • GitHub

Input

*file

A .zip or .tar file containing the image files that will be used for fine-tuning

integer

Random seed for reproducible training. Leave empty to use a random seed

integer

Square pixel resolution which your images will be resized to for training

Default: 768

integer

Batch size (per device) for training

Default: 4

integer

Number of epochs to loop through your training dataset

Default: 2000

integer

Number of individual training steps. Takes precedence over num_train_epochs

Default: 500

boolean

Whether to use LoRA training. If set to False, will use Full fine tuning

Default: true

number

Learning rate for the U-Net. We recommend this value to be somewhere between `1e-6` to `1e-5`.

Default: 0.000001

number

Scaling of learning rate for training textual inversion embeddings. Don't alter unless you know what you're doing.

Default: 0.0003

number

Scaling of learning rate for training LoRA embeddings. Don't alter unless you know what you're doing.

Default: 0.0001

integer

Rank of LoRA embeddings. Don't alter unless you know what you're doing.

Default: 32

string

Learning rate scheduler to use for training

Default: "constant"

integer

Number of warmup steps for lr schedulers with warmups.

Default: 100

string
Shift + Return to add a new line

A unique string that will be trained to refer to the concept in the input images. Can be anything, but TOK works well

Default: "TOK"

string
Shift + Return to add a new line

Text which will be used as prefix during automatic captioning. Must contain the `token_string`. For example, if caption text is 'a photo of TOK', automatic captioning will expand to 'a photo of TOK under a bridge', 'a photo of TOK holding a cup', etc.

Default: "a photo of TOK, "

string
Shift + Return to add a new line

Prompt that describes part of the image that you will find important. For example, if you are fine-tuning your pet, `photo of a dog` will be a good prompt. Prompt-based masking is used to focus the fine-tuning process on the important/salient parts of the image

boolean

If you want to crop the image to `target_size` based on the important parts of the image, set this to True. If you want to crop the image based on face detection, set this to False

Default: true

boolean

If you want to use face detection instead of CLIPSeg for masking. For face applications, we recommend using this option.

Default: false

number

How blurry you want the CLIPSeg mask to be. We recommend this value be something between `0.5` to `1.0`. If you want to have more sharp mask (but thus more errorful), you can decrease this value.

Default: 1

boolean

verbose output

Default: true

integer

Number of steps between saving checkpoints. Set to very very high number to disable checkpointing, because you don't need one.

Default: 999999

string

Filetype of the input images. Can be either `zip` or `tar`. By default its `infer`, and it will be inferred from the ext of input file.

Default: "infer"

Output

Generated in

Run time and cost

This model costs approximately $0.26 to run on Replicate, or 3 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia L40S GPU hardware. Predictions typically complete within 5 minutes. The predict time for this model varies significantly based on the inputs.

Readme

About

This is a hacked together, proof of concept model to train your own SSD-1B LoRAs. At this time, it does not support the standard Replicate training method (replicate.trainings.create).

I took the original SDXL parameters and just halved the num_train_epochs, and max_train_steps

Goal: To create SSD LoRAs at half the time, and half the cost of SDXL, but with no loss in quality

All you need are your training images in a zip file and select “use_face_detection_instead” if training faces

Run inference on SSD-1B LoRAs here