You're looking at a specific version of this model. Jump to the model overview.

edenartlab /sdxl-lora-trainer:f5d47d95

Input

string
Shift + Return to add a new line

Name of new LORA concept

Default: "unnamed"

string
Shift + Return to add a new line

Training images for new LORA concept (can be image urls or a .zip file of images)

string
Shift + Return to add a new line

'face' / 'style' / 'object' (default)

Default: "object"

integer

Random seed for reproducible training. Leave empty to use a random seed

integer

Square pixel resolution which your images will be resized to for training recommended [768-1024]

Default: 960

integer

Batch size (per device) for training

Default: 4

integer

Number of epochs to loop through your training dataset

Default: 10000

integer

Number of individual training steps. Takes precedence over num_train_epochs

Default: 600

integer

Number of steps between saving checkpoints. Set to very very high number to disable checkpointing, because you don't need one.

Default: 10000

boolean

Whether to use LoRA training. If set to False, will use Full fine tuning

Default: true

number

Multiplier for internal learning rate of Prodigy optimizer

Default: 0.8

number

Learning rate for training textual inversion embeddings. Don't alter unless you know what you're doing.

Default: 0.001

number

weight decay for textual inversion embeddings. Don't alter unless you know what you're doing.

Default: 0.0003

number

weight decay for lora parameters. Don't alter unless you know what you're doing.

Default: 0.002

number

Sparsity penalty for the LoRA matrices, increases merge-ability and maybe generalization

Default: 0.1

number

Multiplier for the starting weights of the lora matrices

Default: 0.5

number

see https://arxiv.org/pdf/2303.09556.pdf, set to None to disable snr training

Default: 5

integer

Rank of LoRA embeddings. For faces 5 is good, for complex concepts / styles you can try 8 or 12

Default: 12

string
Shift + Return to add a new line

Prefix text prepended to automatic captioning. Must contain the 'TOK'. Example is 'a photo of TOK, '. If empty, chatgpt will take care of this automatically

Default: ""

boolean

Add left-right flipped version of each img to the training data, recommended for most cases. If you are learning a face, you prob want to disable this

Default: true

integer

Apply data augmentation (no lr-flipping) until there are n training samples (0 disables augmentation completely)

Default: 20

integer

How many new tokens to inject per concept

Default: 2

string
Shift + Return to add a new line

Prompt that describes most important part of the image, will be used for CLIP-segmentation. For example, if you are learning a person 'face' would be a good segmentation prompt

boolean

If you want to crop the image to `target_size` based on the important parts of the image, set this to True. If you want to crop the image based on face detection, set this to False

Default: true

boolean

If you want to use face detection instead of CLIPSeg for masking. For face applications, we recommend using this option.

Default: false

number

How blurry you want the CLIPSeg mask to be. We recommend this value be something between `0.5` to `1.0`. If you want to have more sharp mask (but thus more errorful), you can decrease this value.

Default: 0.6

boolean

verbose output

Default: true

string
Shift + Return to add a new line

Subdirectory where all files will be saved

Default: "1717014314"

boolean

for debugging locally only (dont activate this on replicate)

Default: false

boolean

Use hard freeze for ti_lr. If set to False, will use soft transition of learning rates

Default: false

number

How strongly to correct the embedding std vs the avg-std (0=off, 0.05=weak, 0.1=standard)

Default: 0.1

Output

No output yet! Press "Submit" to start a prediction.