cloneofsimo / lora-advanced-training

LoRA model trainer, advanced version (Updated 2 years, 4 months ago)

  • Public
  • 2.9K runs
  • GitHub
  • License
Iterate in playground

Input

*file

A ZIP file containing your training images (JPG, PNG, etc. size not restricted). These images contain your 'subject' that you want the trained model to embed in the output domain for later generating customized scenes beyond the training images. For best results, use images without noise or unrelated objects in the background.

integer

A seed for reproducible training

Default: 1337

integer

The resolution for input images. All the images in the train/validation dataset will be resized to this resolution.

Default: 512

boolean

Whether to train the text encoder

Default: true

integer

Batch size (per device) for the training dataloader.

Default: 1

integer

Number of updates steps to accumulate before performing a backward/update pass.

Default: 4

boolean

Whether or not to use gradient checkpointing to save memory at the expense of slower backward pass.

Default: false

boolean

Scale the learning rate by the number of GPUs, gradient accumulation steps, and batch size.

Default: true

string

The scheduler type to use

Default: "constant"

integer

Number of steps for the warmup in the lr scheduler.

Default: 0

boolean

Whether or not to perform Bayesian Learning Rule on norm of the CLIP latent.

Default: true

boolean

Whether or not to cache VAE latent.

Default: true

boolean

Whether or not to use color jitter at augmentation.

Default: true

boolean

Whether or not to continue inversion.

Default: false

number

The learning rate for continuing an inversion.

Default: 0.0001

string
Shift + Return to add a new line

The tokens to use for the initializer. If not provided, will randomly initialize from gaussian N(0,0.017^2)

number

The learning rate for the text encoder.

Default: 0.00001

number

The learning rate for the TI.

Default: 0.0005

number

The learning rate for the unet.

Default: 0.0001

integer

Rank of the LoRA. Larger it is, more likely to capture fidelity but less likely to be editable. Larger rank will make the end result larger.

Default: 4

number

Dropout for the LoRA layer. Reference LoRA paper for more details.

Default: 0.1

number

Scaling parameter at the end of the LoRA layer.

Default: 1

string

The scheduler type to use

Default: "constant"

integer

Number of steps for the warmup in the lr scheduler.

Default: 0

integer

The maximum number of training steps for the TI.

Default: 500

integer

The maximum number of training steps for the tuning.

Default: 1000

string
Shift + Return to add a new line

If this value is provided as 'X|Y', it will transform target word X into Y at caption. You are required to provide caption as filename (not regarding extension), and Y has to contain placeholder token below. You are also required to set `None` for `use_template` argument to use this feature.

string
Shift + Return to add a new line

The placeholder tokens to use for the initializer. If not provided, will use the first tokens of the data.

Default: "<s1>|<s2>"

boolean

Whether or not to use the face segmentation condition.

Default: false

string

The template to use for the inversion.

Default: "object"

number

The weight decay for the LORA loss.

Default: 0.001

number

The weight decay for the TI.

Default: 0

Output

Generated in

This output was created using a different version of the model, cloneofsimo/lora-advanced-training:4161f6a9.

Run time and cost

This model runs on Nvidia A100 (80GB) GPU hardware. We don't yet have enough runs of this model to provide performance information.

Readme

LoRA Pivotal Tuning Inversion Training

Model description

There are many methods to fine-tune Stable diffusion models. One can use Low-rank adaption with pivotal-tuning inversion to achieve high-editable, efficient fine-tuning. Output models can be used with Replicate’s LoRA for inference.

If you don’t want to set all of the hyperparameters yourself, you can use https://replicate.com/cloneofsimo/lora-training which has presets for faces, objects, and styles.

Ethical considerations

Do not use this model to produce harmful results. As this method strictly utilizes stable-diffusion, ethical considerations that were addressed in BigScience OpenRAIL-M license should be addressed here as well.

Caveats and recommendations

  • Use many, diverse, high quality dataset. Any blur, noises and artifacts will have negative effect to the training process. Having different lighting conditions, shapes, angles, and various sizes will help very much.

  • Images will be resized and cropped to 512 x 512 by default, thus it is recommended to prepare datasets with larger than 512 x 512.

  • Using face template requires all input images to have human face, and only one per image. For example, it will not work with animal faces, or highly unhuman-like character faces.

Advacned Argument Documentation

  • instance_data: A ZIP file containing your training images (JPG, PNG, etc. size not restricted). These images contain your ‘subject’ that you want the trained model to embed in the output domain for later generating customized scenes beyond the training images. For best results, use images without noise or unrelated objects in the background. (Type: Path, Default: None)

  • seed: A seed for reproducible training (Type: int, Default: 1337)

  • resolution: The resolution for input images. All the images in the train/validation dataset will be resized to this resolution. (Type: int, Default: 512)

  • train_text_encoder: Whether to train the text encoder. (Type: bool, Default: True)

  • train_batch_size: Batch size (per device) for the training dataloader. (Type: int, Default: 1)

  • gradient_accumulation_steps: Number of updates steps to accumulate before performing a backward/update pass. (Type: int, Default: 4)

  • gradient_checkpointing: Whether or not to use gradient checkpointing to save memory at the expense of slower backward pass. (Type: bool, Default: False)

  • scale_lr: Scale the learning rate by the number of GPUs, gradient accumulation steps, and batch size. (Type: bool, Default: True)

  • lr_scheduler: The scheduler type to use. (Type: str, Choices: [“linear”, “cosine”, “cosine_with_restarts”, “polynomial”, “constant”, “constant_with_warmup”], Default: “constant”)

  • lr_warmup_steps: Number of steps for the warmup in the lr scheduler. (Type: int, Default: 0)

  • clip_ti_decay: Whether or not to perform Bayesian Learning Rule on norm of the CLIP latent. (Type: bool, Default: True)

  • color_jitter: Whether or not to use color jitter at augmentation. (Type: bool, Default: True)

  • continue_inversion: Whether or not to continue inversion. (Type: bool, Default: False)

  • continue_inversion_lr: The learning rate for continuing an inversion. (Type: float, Default: 1e-4)

  • initializer_tokens: The tokens to use for the initializer. If not provided, will randomly initialize from gaussian N(0,0.017^2)

  • learning_rate_text, learning_rate_ti, learning_rate_unet, Learning rate for Text Encoder, Textual Embedding, Unet respectively. Recommended values : 1e-5, 5e-4, 1e-4.

  • lora_rank, Rank of the LoRA. Larger it is, more likely to capture fidelity but less likely to be editable. Larger rank will make the end result larger. (Type: int, Default: 4)

  • lora_dropout_p, Dropout for the LoRA layer. Reference [1] (Type: float, Default: 0.1)

  • lora_scale, Scaling parameter at the end of the LoRA layer. Reference [1] (Type: float, Default: 1.0)

  • lr_scheduler_lora: LR Scheduler for LoRA. (Type: str, Default: “constant”)

Choices: “linear”, “cosine”, “cosine_with_restarts”, “polynomial”, “constant”, “constant_with_warmup”

  • lr_warmup_steps_lora: Number of steps for the warmup in the LR scheduler. (Type: int, Default: 0)

  • max_train_steps_ti: The maximum number of training steps for the TI. (Type: int, Default: 500)

  • max_train_steps_tuning: The maximum number of training steps for the tuning. (Type: int, Default: 1000)

  • placeholder_token_at_data: If this value is provided as “X|Y”, it will transform target word X into Y at caption. You are required to provide caption as filename (not regarding extension), and Y has to contain placeholder token below. You are also required to set None for use_template argument to use this feature. (Type: str, Default: None)

  • placeholder_tokens: The placeholder tokens to use for the initializer. (Type: str, Default: “<s1>|<s2>“)

  • use_face_segmentation_condition: Whether or not to use the face segmentation condition. (Type: bool, Default: False)

  • use_template: The template to use for the inversion. (Type: str, Default: “object”)

Choices: “object”, “style”, “none”

  • weight_decay_lora: The weight decay for the LORA loss. (Type: float, Default: 0.001)

  • weight_decay_ti: The weight decay for the TI. (Type: float, Default: 0.00)

[1] : Hu, Edward J., et al. “Lora: Low-rank adaptation of large language models.” arXiv preprint arXiv:2106.09685 (2021).