prompthunt/cog-realvis-training:579bedb6 | Run with an API on Replicate

You're looking at a specific version of this model. Jump to the model overview.

prompthunt /cog-realvis-training:579bedb6

Input schema

The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.

Field	Type	Default value	Description
input_images	string		A .zip or .tar file containing the image files that will be used for fine-tuning
seed	integer		Random seed for reproducible training. Leave empty to use a random seed
resolution	integer	768	Square pixel resolution which your images will be resized to for training
train_batch_size	integer	4	Batch size (per device) for training
num_train_epochs	integer	4000	Number of epochs to loop through your training dataset
max_train_steps	integer	1000	Number of individual training steps. Takes precedence over num_train_epochs
is_lora	boolean	True	Whether to use LoRA training. If set to False, will use Full fine tuning
unet_learning_rate	number	0.000001	Learning rate for the U-Net. We recommend this value to be somewhere between `1e-6` to `1e-5`.
ti_lr	number	0.0003	Scaling of learning rate for training textual inversion embeddings. Don't alter unless you know what you're doing.
lora_lr	number	0.0001	Scaling of learning rate for training LoRA embeddings. Don't alter unless you know what you're doing.
lora_rank	integer	32	Rank of LoRA embeddings. Don't alter unless you know what you're doing.
lr_scheduler	None	constant	Learning rate scheduler to use for training
lr_warmup_steps	integer	100	Number of warmup steps for lr schedulers with warmups.
token_string	string	TOK	A unique string that will be trained to refer to the concept in the input images. Can be anything, but TOK works well
caption_prefix	string	a photo of TOK,	Text which will be used as prefix during automatic captioning. Must contain the `token_string`. For example, if caption text is 'a photo of TOK', automatic captioning will expand to 'a photo of TOK under a bridge', 'a photo of TOK holding a cup', etc.
mask_target_prompts	string		Prompt that describes part of the image that you will find important. For example, if you are fine-tuning your pet, `photo of a dog` will be a good prompt. Prompt-based masking is used to focus the fine-tuning process on the important/salient parts of the image
crop_based_on_salience	boolean	True	If you want to crop the image to `target_size` based on the important parts of the image, set this to True. If you want to crop the image based on face detection, set this to False
use_face_detection_instead	boolean	False	If you want to use face detection instead of CLIPSeg for masking. For face applications, we recommend using this option.
clipseg_temperature	number	1	How blurry you want the CLIPSeg mask to be. We recommend this value be something between `0.5` to `1.0`. If you want to have more sharp mask (but thus more errorful), you can decrease this value.
verbose	boolean	True	verbose output
checkpointing_steps	integer	999999	Number of steps between saving checkpoints. Set to very very high number to disable checkpointing, because you don't need one.
input_images_filetype	None	infer	Filetype of the input images. Can be either `zip` or `tar`. By default its `infer`, and it will be inferred from the ext of input file.

Output schema

The shape of the response you’ll get when you run this model with an API.

Schema

{'format': 'uri', 'title': 'Output', 'type': 'string'}