You're looking at a specific version of this model. Jump to the model overview.
nsfw-api /hunyuan-character-lora-trainer:7d5b6acc
Input schema
The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.
Field | Type | Default value | Description |
---|---|---|---|
input_videos |
string
|
A zip file containing videos/images and optionally matching .txt captions. (For videos (e.g. .mp4) the caption file must be named the same (video.mp4 and video.txt) and for images (e.g. .jpg, .jpeg or .png), the same applies.)
|
|
trigger_word |
string
|
TOK
|
The trigger word refers to the object, style or concept you are training on. Pick a string that isn't a real word, like TOK or something related to what's being trained, like STYLE3D. The trigger word you specify here will be associated with all videos during training. Then when you use your LoRA, you can include the trigger word in prompts to help activate the LoRA.
|
autocaption |
boolean
|
True
|
Automatically caption videos using QWEN-VL
|
autocaption_prefix |
string
|
Optional: Text you want to appear at the beginning of all your generated captions; for example, 'a video of TOK, '. You can include your trigger word in the prefix. Prefixes help set the right context for your captions.
|
|
autocaption_suffix |
string
|
Optional: Text you want to appear at the end of all your generated captions; for example, ' in the style of TOK'. You can include your trigger word in suffixes. Suffixes help set the right concept for your captions.
|
|
epochs |
integer
|
16
Min: 1 Max: 2000 |
Number of training epochs. Each epoch processes all your videos once. Note: If max_train_steps is set, training may end before completing all epochs.
|
max_train_steps |
integer
|
-1
Min: -1 Max: 1000000 |
Maximum number of training steps to perform. Each step processes one batch of frames. Set to -1 to train for the full number of epochs. If positive, training will stop after this many steps even if all epochs aren't complete.
|
rank |
integer
|
32
Min: 1 Max: 128 |
LoRA rank for training. Higher ranks take longer to train but can capture more complex features. Caption quality is more important for higher ranks.
|
batch_size |
integer
|
4
Min: 1 Max: 8 |
Batch size for training. Lower values use less memory but train slower.
|
learning_rate |
number
|
0.001
Min: 0.00001 Max: 1 |
Learning rate for training. If you're new to training you probably don't need to change this.
|
optimizer |
None
|
adamw8bit
|
Optimizer type for training. If you're unsure, leave as default.
|
timestep_sampling |
None
|
sigmoid
|
Controls how timesteps are sampled during training. 'sigmoid' (default) concentrates samples in the middle of the diffusion process. 'uniform' samples evenly across all timesteps. 'sigma' samples based on the noise schedule. 'shift' uses shifted sampling with discrete flow shift. If unsure, use 'sigmoid'.
|
consecutive_target_frames |
None
|
[1, 25, 45]
|
The lengths of consecutive frames to extract from each video.
|
frame_extraction_method |
None
|
head
|
Method to extract frames from videos during training.
|
frame_stride |
integer
|
10
Min: 1 Max: 100 |
Frame stride for 'slide' extraction method.
|
frame_sample |
integer
|
4
Min: 1 Max: 20 |
Number of samples for 'uniform' extraction method.
|
seed |
integer
|
0
|
Random seed for training. Use <=0 for random.
|
hf_repo_id |
string
|
Hugging Face repository ID, if you'd like to upload the trained LoRA to Hugging Face. For example, username/my-video-lora. If the given repo does not exist, a new public repo will be created.
|
|
hf_token |
string
|
Hugging Face token, if you'd like to upload the trained LoRA to Hugging Face.
|
Output schema
The shape of the response you’ll get when you run this model with an API.
{'properties': {'weights': {'format': 'uri',
'title': 'Weights',
'type': 'string'}},
'required': ['weights'],
'title': 'Output',
'type': 'object'}