You're looking at a specific version of this model. Jump to the model overview.
Input schema
The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.
Field | Type | Default value | Description |
---|---|---|---|
image |
string
|
Input image. This specifies the input portrait. The resolution should be larger than 256x256 and will be cropped to 256x256.
|
|
audio |
string
|
Input audio file. The input audio file extensions should be wav, mp3, m4a, and mp4 (video with sound) should all be compatible.
|
|
style_clip |
string
(enum)
|
data/style_clip/3DMM/M030_front_neutral_level1_001.mat
Options: data/style_clip/3DMM/M030_front_happy_level3_001.mat, data/style_clip/3DMM/M030_front_contempt_level3_001.mat, data/style_clip/3DMM/W011_front_surprised_level3_001.mat, data/style_clip/3DMM/W009_front_angry_level3_001.mat, data/style_clip/3DMM/M030_front_disgusted_level3_001.mat, data/style_clip/3DMM/W009_front_fear_level3_001.mat, data/style_clip/3DMM/W011_front_neutral_level1_001.mat, data/style_clip/3DMM/M030_front_fear_level3_001.mat, data/style_clip/3DMM/W011_front_angry_level3_001.mat, data/style_clip/3DMM/M030_front_sad_level3_001.mat, data/style_clip/3DMM/W009_front_sad_level3_001.mat, data/style_clip/3DMM/W011_front_sad_level3_001.mat, data/style_clip/3DMM/M030_front_neutral_level1_001.mat, data/style_clip/3DMM/W011_front_disgusted_level3_001.mat, data/style_clip/3DMM/W009_front_contempt_level3_001.mat, data/style_clip/3DMM/W009_front_happy_level3_001.mat, data/style_clip/3DMM/W011_front_contempt_level3_001.mat, data/style_clip/3DMM/M030_front_angry_level3_001.mat, data/style_clip/3DMM/W009_front_surprised_level3_001.mat, data/style_clip/3DMM/W011_front_fear_level3_001.mat, data/style_clip/3DMM/W009_front_neutral_level1_001.mat, data/style_clip/3DMM/W011_front_happy_level3_001.mat, data/style_clip/3DMM/W009_front_disgusted_level3_001.mat, data/style_clip/3DMM/M030_front_surprised_level3_001.mat |
Input style_clip_mat, optional. This specifies the reference speaking style.
|
pose |
string
(enum)
|
data/pose/RichardShelby_front_neutral_level1_001.mat
Options: data/pose/RichardShelby_front_neutral_level1_001.mat |
Input pose, specifies the head pose and should be a .mat file.
|
max_gen_len |
integer
|
1000
|
The maximum length (seconds) limitation for generating videos.
|
cfg_scale |
number
|
1
|
The scale of classifier-free guidance. It can adjust the intensity of speaking styles.
|
num_inference_steps |
integer
|
10
Min: 1 Max: 500 |
Number of denoising steps
|
crop_image |
boolean
|
True
|
Enable cropping the input image. If your portrait is already cropped to 256x256, set this to False.
|
Output schema
The shape of the response you’ll get when you run this model with an API.
{'format': 'uri', 'title': 'Output', 'type': 'string'}