You're looking at a specific version of this model. Jump to the model overview.

nsfw-api /hunyuan-character-lora-trainer:b1af78df

Input schema

The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.

Field Type Default value Description
input_videos
string
A zip file containing videos/images and optionally matching .txt captions. (For videos (e.g. .mp4) the caption file must be named the same (video.mp4 and video.txt) and for images (e.g. .jpg, .jpeg or .png), the same applies.)
trigger_word
string
TOK
The trigger word refers to the object, style or concept you are training on. Pick a string that isn't a real word, like TOK or something related to what's being trained, like STYLE3D. The trigger word you specify here will be associated with all videos during training. Then when you use your LoRA, you can include the trigger word in prompts to help activate the LoRA.
autocaption
boolean
True
Automatically caption videos using QWEN-VL
autocaption_prefix
string
Optional: Text you want to appear at the beginning of all your generated captions; for example, 'a video of TOK, '. You can include your trigger word in the prefix. Prefixes help set the right context for your captions.
autocaption_suffix
string
Optional: Text you want to appear at the end of all your generated captions; for example, ' in the style of TOK'. You can include your trigger word in suffixes. Suffixes help set the right concept for your captions.
epochs
integer
16

Min: 1

Max: 2000

Number of training epochs. Each epoch processes all your videos once. Note: If max_train_steps is set, training may end before completing all epochs.
max_train_steps
integer
-1

Min: -1

Max: 1000000

Maximum number of training steps to perform. Each step processes one batch of frames. Set to -1 to train for the full number of epochs. If positive, training will stop after this many steps even if all epochs aren't complete.
rank
integer
32

Min: 1

Max: 128

LoRA rank for training. Higher ranks take longer to train but can capture more complex features. Caption quality is more important for higher ranks.
batch_size
integer
4

Min: 1

Max: 8

Batch size for training. Lower values use less memory but train slower.
learning_rate
number
0.001

Min: 0.00001

Max: 1

Learning rate for training. If you're new to training you probably don't need to change this.
optimizer
None
adamw8bit
Optimizer type for training. If you're unsure, leave as default.
timestep_sampling
None
sigmoid
Controls how timesteps are sampled during training. 'sigmoid' (default) concentrates samples in the middle of the diffusion process. 'uniform' samples evenly across all timesteps. 'sigma' samples based on the noise schedule. 'shift' uses shifted sampling with discrete flow shift. If unsure, use 'sigmoid'.
consecutive_target_frames
None
[1, 25, 45]
The lengths of consecutive frames to extract from each video.
frame_extraction_method
None
head
Method to extract frames from videos during training.
frame_stride
integer
10

Min: 1

Max: 100

Frame stride for 'slide' extraction method.
frame_sample
integer
4

Min: 1

Max: 20

Number of samples for 'uniform' extraction method.
seed
integer
0
Random seed for training. Use <=0 for random.
hf_repo_id
string
Hugging Face repository ID, if you'd like to upload the trained LoRA to Hugging Face. For example, username/my-video-lora. If the given repo does not exist, a new public repo will be created.
hf_token
string
Hugging Face token, if you'd like to upload the trained LoRA to Hugging Face.

Output schema

The shape of the response you’ll get when you run this model with an API.

Schema
{'properties': {'weights': {'format': 'uri',
                            'title': 'Weights',
                            'type': 'string'}},
 'required': ['weights'],
 'title': 'Output',
 'type': 'object'}