Before fine-tuning starts, the input images are preprocessed using SwinIR for upscaling, BLIP for captioning, and CLIPSeg for removing regions of the images that are not interesting or helpful for training.
Below is a list of all fine-tuning parameters.
input_images(required): A .zip or .tar file containing the image files that will be used for fine-tuning.
seed: Random seed integer for reproducible training. Leave empty to use a random seed.
resolution: Square pixel resolution which your images will be resized to for training. Defaults to
train_batch_size: Batch size (per device) for training. Defaults to
num_train_epochs: Number of epochs to loop through your training dataset. Defaults to
max_train_steps: Number of individual training steps. Takes precedence over num_train_epochs. Defaults to
is_lora: Boolean indicating whether to use LoRA training. If set to False, will use Full fine tuning. Defaults to
unet_learning_rate: Learning rate for the U-Net as a float. We recommend this value to be somewhere between
1e-5. Defaults to
ti_lr: Scaling of learning rate for training textual inversion embeddings. Don’t alter unless you know what you’re doing. Defaults to
lora_lr: Scaling of learning rate for training LoRA embeddings. Don’t alter unless you know what you’re doing. Defaults to
lr_scheduler: Learning rate scheduler to use for training. Allowable values are
linear. Defaults to
lr_warmup_steps: Number of warmup steps for lr schedulers with warmups. Defaults to
token_string: A unique string that will be trained to refer to the concept in the input images. Can be anything, but TOK works well. Defaults to
caption_prefix: Text which will be used as prefix during automatic captioning. Must contain the
token_string. For example, if caption text is ‘a photo of TOK’, automatic captioning will expand to ‘a photo of TOK under a bridge’, ‘a photo of TOK holding a cup’, etc.”, Defaults to
a photo of TOK.
mask_target_prompts: Prompt that describes part of the image that you will find important. For example, if you are fine-tuning your pet,
photo of a dogwill be a good prompt. Prompt-based masking is used to focus the fine-tuning process on the important/salient parts of the image. Defaults to None.
crop_based_on_salience: If you want to crop the image to
target_size: based on the important parts of the image, set this to True. If you want to crop the image based on face detection, set this to False. Defaults to
use_face_detection_instead: If you want to use face detection instead of CLIPSeg for masking. For face applications, we recommend using this option. Defaults to
clipseg_temperature: How blurry you want the CLIPSeg mask to be. We recommend this value be something between
1.0. If you want to have more sharp mask (but thus more errorful), you can decrease this value. Defaults to
verbose: Verbose output. Defaults to
checkpointing_steps: Number of steps between saving checkpoints. Set to very very high number to disable checkpointing, because you don’t need one. Defaults to