prompthunt / cog-realvisxl2-lora-training

  • Public
  • 35 runs

Run prompthunt/cog-realvisxl2-lora-training with an API

Use one of our client libraries to get started quickly. Clicking on a library will take you to the Playground tab where you can tweak different inputs, see the results, and copy the corresponding code to use in your own project.

Input schema

The fields you can use to run this model with an API. If you don't give a value for a field its default value will be used.

Field Type Default value Description
input_images
string
A .zip or .tar file containing the image files that will be used for fine-tuning
seed
integer
Random seed for reproducible training. Leave empty to use a random seed
resolution
integer
768
Square pixel resolution which your images will be resized to for training
train_batch_size
integer
4
Batch size (per device) for training
num_train_epochs
integer
4000
Number of epochs to loop through your training dataset
max_train_steps
integer
1000
Number of individual training steps. Takes precedence over num_train_epochs
is_lora
boolean
True
Whether to use LoRA training. If set to False, will use Full fine tuning
unet_learning_rate
number
0.000001
Learning rate for the U-Net. We recommend this value to be somewhere between `1e-6` to `1e-5`.
ti_lr
number
0.0003
Scaling of learning rate for training textual inversion embeddings. Don't alter unless you know what you're doing.
lora_lr
number
0.0001
Scaling of learning rate for training LoRA embeddings. Don't alter unless you know what you're doing.
lora_rank
integer
32
Rank of LoRA embeddings. Don't alter unless you know what you're doing.
lr_scheduler
string (enum)
constant

Options:

constant, linear

Learning rate scheduler to use for training
lr_warmup_steps
integer
100
Number of warmup steps for lr schedulers with warmups.
token_string
string
TOK
A unique string that will be trained to refer to the concept in the input images. Can be anything, but TOK works well
caption_prefix
string
a photo of TOK,
Text which will be used as prefix during automatic captioning. Must contain the `token_string`. For example, if caption text is 'a photo of TOK', automatic captioning will expand to 'a photo of TOK under a bridge', 'a photo of TOK holding a cup', etc.
mask_target_prompts
string
Prompt that describes part of the image that you will find important. For example, if you are fine-tuning your pet, `photo of a dog` will be a good prompt. Prompt-based masking is used to focus the fine-tuning process on the important/salient parts of the image
crop_based_on_salience
boolean
True
If you want to crop the image to `target_size` based on the important parts of the image, set this to True. If you want to crop the image based on face detection, set this to False
use_face_detection_instead
boolean
False
If you want to use face detection instead of CLIPSeg for masking. For face applications, we recommend using this option.
clipseg_temperature
number
1
How blurry you want the CLIPSeg mask to be. We recommend this value be something between `0.5` to `1.0`. If you want to have more sharp mask (but thus more errorful), you can decrease this value.
verbose
boolean
True
verbose output
checkpointing_steps
integer
999999
Number of steps between saving checkpoints. Set to very very high number to disable checkpointing, because you don't need one.
input_images_filetype
string (enum)
infer

Options:

zip, tar, infer

Filetype of the input images. Can be either `zip` or `tar`. By default its `infer`, and it will be inferred from the ext of input file.

Output schema

The shape of the response you’ll get when you run this model with an API.

Schema
{
  "type": "string",
  "title": "Output",
  "format": "uri"
}