You're looking at a specific version of this model. Jump to the model overview.

zsxkib /humo:121a2140

Input schema

The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.

Field Type Default value Description
prompt
string
A person walking confidently down a busy street
Text description of the video. Be detailed about the person, actions, and scene.
reference_image
string
Reference image to control the person's appearance (optional)
audio
string
Audio file for lip-sync and movement synchronization (optional)
width
integer
1280

Min: 640

Max: 1344

Video width in pixels (will be rounded to nearest multiple of 8)
height
integer
720

Min: 384

Max: 768

Video height in pixels (will be rounded to nearest multiple of 8)
num_frames
integer
49

Min: 1

Max: 97

Number of frames (25 fps, so 25 frames = 1 second)
num_inference_steps
integer
20

Min: 5

Max: 100

Denoising steps. More steps = higher quality but slower
guidance_scale
number
4

Min: 1

Max: 20

Text guidance strength. Higher = follows prompt more closely. Lower values (3-5) often produce more natural lighting.
audio_guidance_scale
number
5.5

Min: 1

Max: 20

Audio guidance strength (when audio provided). Higher = better sync
seed
integer

Max: 2147483647

Random seed for reproducible generation
negative_prompt
string
blurry, low quality, distorted, bad anatomy
What to avoid in the video

Output schema

The shape of the response you’ll get when you run this model with an API.

Schema
{'format': 'uri', 'title': 'Output', 'type': 'string'}