You're looking at a specific version of this model. Jump to the model overview.
Input schema
The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.
Field | Type | Default value | Description |
---|---|---|---|
image |
string
|
Input image. This specifies the input portrait. The resolution should be larger than 256x256 and will be cropped to 256x256.
|
|
audio |
string
|
Input audio file. The input audio file extensions should be wav, mp3, m4a, and mp4 (video with sound) should all be compatible.
|
|
style_clip |
string
|
Input style_clip_mat, optional. This specifies the reference speaking style and should be a .mat or .txt file.
|
|
pose |
string
|
Input pose, optional. This specifies the head pose and should be a .mat file.
|
|
max_gen_len |
integer
|
1000
|
The maximum length (seconds) limitation for generating videos.
|
cfg_scale |
number
|
1
|
The scale of classifier-free guidance. It can adjust the intensity of speaking styles.
|
num_inference_steps |
integer
|
10
Min: 1 Max: 500 |
Number of denoising steps
|
crop_image |
boolean
|
True
|
Enable cropping the input image. If your portrait is already cropped to 256x256, set this to False.
|
Output schema
The shape of the response you’ll get when you run this model with an API.
Schema
{'format': 'uri', 'title': 'Output', 'type': 'string'}