alexgenovese / train-sdxl-lora

Train on RealVisXL 4.0 (Realistic Vision XL 4) | Mixed precision bf16 any LoRA

  • Public
  • 587 runs
  • A100 (80GB)
  • GitHub

Input

*file

A .zip or .tar file containing the image files that will be used for fine-tuning

integer

Random seed for reproducible training. Leave empty to use a random seed

integer

Square pixel resolution which your images will be resized to for training

Default: 1024

integer

Batch size (per device) for training

Default: 3

integer

Number of epochs to loop through your training dataset

Default: 20

integer

Number of individual training steps. Takes precedence over num_train_epochs

string

Token class to pass to ClipSeg for identify the object

Default: "bag"

boolean

Whether to use LoRA training. If set to False, will use Full fine tuning

Default: true

number

Learning rate for the U-Net. We recommend this value to be somewhere between `1e-6` to `1e-5`.

Default: 0.0001

number

Scaling of learning rate for training textual inversion embeddings. Don't alter unless you know what you're doing.

Default: 0.0001

number

Scaling of learning rate for training LoRA embeddings. Don't alter unless you know what you're doing.

Default: 0.0004

integer

Rank of LoRA embeddings. Don't alter unless you know what you're doing.

Default: 32

integer

Rank of LoRA Alpha.

Default: 16

string

Learning rate scheduler to use for training

Default: "constant"

string

TODO: now using

Default: "AdamW"

integer

Number of warmup steps for lr schedulers with warmups.

Default: 0

string
Shift + Return to add a new line

A unique string that will be trained to refer to the concept in the input images. Can be anything, but TOK works well

Default: "siduhc"

string
Shift + Return to add a new line

Text which will be used as prefix during automatic captioning. Must contain the `token_string`. For example, if caption text is 'a photo of TOK', automatic captioning will expand to 'a photo of TOK under a bridge', 'a photo of TOK holding a cup', etc.

Default: "a photo of siduhc "

string
Shift + Return to add a new line

Prompt that describes part of the image that you will find important. For example, if you are fine-tuning your pet, `photo of a dog` will be a good prompt. Prompt-based masking is used to focus the fine-tuning process on the important/salient parts of the image

boolean

If you want to crop the image to `target_size` based on the important parts of the image, set this to True. If you want to crop the image based on face detection, set this to False

Default: true

boolean

If you want to use face detection instead of CLIPSeg for masking. For face applications, we recommend using this option.

Default: false

number

How blurry you want the CLIPSeg mask to be. We recommend this value be something between `0.5` to `1.0`. If you want to have more sharp mask (but thus more errorful), you can decrease this value.

Default: 1

boolean

verbose output

Default: true

integer

Number of steps between saving checkpoints. Set to very very high number to disable checkpointing, because you don't need one.

Default: 999999

string

Filetype of the input images. Can be either `zip` or `tar`. By default its `infer`, and it will be inferred from the ext of input file.

Default: "infer"

Output

Generated in

Run time and cost

This model costs approximately $0.77 to run on Replicate, or 1 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia A100 (80GB) GPU hardware. Predictions typically complete within 10 minutes. The predict time for this model varies significantly based on the inputs.

Readme

Custom Training based on Realistic Vision XL 4.0

How it works

  1. Upload a single ZIP file with all images you want to train
  2. Setup the parameters
  3. Run the inference

**Default hard coded settings: - mixed_precision: bf16 - save_precision: fp16

**Optional settings: - scheduler: constant, linear, cosine, cosine with restarts, polynomial, constant with warmup, inverse sqrt, reduce lr on plateau - optimization: AdamW, Adafactor, AdamWeightDecay