You're looking at a specific version of this model. Jump to the model overview.

zsxkib /humo:2cca6792

Input schema

The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.

Field Type Default value Description
prompt
string
A person walking confidently down a busy street
Text description of the video. Be detailed about the person, actions, and scene.
reference_image
string
Reference image to control the person's appearance (optional)
audio
string
Audio file for lip-sync and movement synchronization (optional)
width
None
1280
Video width in pixels
height
None
720
Video height in pixels
num_frames
integer
49

Min: 9

Max: 97

Number of frames (25 fps, so 25 frames = 1 second). Model trained on up to 97 frames.
num_inference_steps
integer
50

Min: 10

Max: 100

Denoising steps. More steps = higher quality but slower. Research default is 50.
guidance_scale
number
5

Min: 2

Max: 15

Text guidance strength. Research default is 5.0. Lower values (3-5) often produce more natural lighting.
audio_guidance_scale
number
5.5

Min: 2

Max: 15

Audio guidance strength (when audio provided). Higher = better sync. Research default is 5.5.
seed
integer

Max: 2147483647

Random seed for reproducible generation
negative_prompt
string
blurry, low quality, distorted, bad anatomy
What to avoid in the video

Output schema

The shape of the response you’ll get when you run this model with an API.

Schema
{'format': 'uri', 'title': 'Output', 'type': 'string'}