You're looking at a specific version of this model. Jump to the model overview.

zsxkib /humo:86fa063d

Input schema

The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.

Field Type Default value Description
prompt
string
A person dancing to energetic music
Text description of the desired video
image
string
Optional reference image for character/scene (for text+image or text+image+audio modes)
audio
string
Optional audio file for synchronization (for text+audio or text+image+audio modes)
mode
None
text_only
Generation mode based on available inputs
frames
integer
97

Min: 1

Max: 97

Number of frames to generate (HuMo is trained on 97-frame sequences)
height
None
720
Video height in pixels
width
None
1280
Video width in pixels (recommended: 832 for 480p, 1280 for 720p)
steps
integer
50

Min: 30

Max: 50

Number of denoising steps (higher = better quality, slower generation)
scale_t
number
1

Max: 2

Text guidance strength (higher = better text adherence)
scale_a
number
1

Max: 2

Audio guidance strength (higher = better audio synchronization)
seed
integer
-1
Random seed for reproducible results. Use -1 for random seed

Output schema

The shape of the response you’ll get when you run this model with an API.

Schema
{'format': 'uri', 'title': 'Output', 'type': 'string'}