You're looking at a specific version of this model. Jump to the model overview.

cjwbw /dreamtalk:a22ed728

Input schema

The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.

Field Type Default value Description
image
string
Input image. This specifies the input portrait. The resolution should be larger than 256x256 and will be cropped to 256x256.
audio
string
Input audio file. The input audio file extensions should be wav, mp3, m4a, and mp4 (video with sound) should all be compatible.
style_clip
string (enum)
data/style_clip/3DMM/M030_front_neutral_level1_001.mat

Options:

data/style_clip/3DMM/M030_front_happy_level3_001.mat, data/style_clip/3DMM/M030_front_contempt_level3_001.mat, data/style_clip/3DMM/W011_front_surprised_level3_001.mat, data/style_clip/3DMM/W009_front_angry_level3_001.mat, data/style_clip/3DMM/M030_front_disgusted_level3_001.mat, data/style_clip/3DMM/W009_front_fear_level3_001.mat, data/style_clip/3DMM/W011_front_neutral_level1_001.mat, data/style_clip/3DMM/M030_front_fear_level3_001.mat, data/style_clip/3DMM/W011_front_angry_level3_001.mat, data/style_clip/3DMM/M030_front_sad_level3_001.mat, data/style_clip/3DMM/W009_front_sad_level3_001.mat, data/style_clip/3DMM/W011_front_sad_level3_001.mat, data/style_clip/3DMM/M030_front_neutral_level1_001.mat, data/style_clip/3DMM/W011_front_disgusted_level3_001.mat, data/style_clip/3DMM/W009_front_contempt_level3_001.mat, data/style_clip/3DMM/W009_front_happy_level3_001.mat, data/style_clip/3DMM/W011_front_contempt_level3_001.mat, data/style_clip/3DMM/M030_front_angry_level3_001.mat, data/style_clip/3DMM/W009_front_surprised_level3_001.mat, data/style_clip/3DMM/W011_front_fear_level3_001.mat, data/style_clip/3DMM/W009_front_neutral_level1_001.mat, data/style_clip/3DMM/W011_front_happy_level3_001.mat, data/style_clip/3DMM/W009_front_disgusted_level3_001.mat, data/style_clip/3DMM/M030_front_surprised_level3_001.mat

Input style_clip_mat, optional. This specifies the reference speaking style.
pose
string (enum)
data/pose/RichardShelby_front_neutral_level1_001.mat

Options:

data/pose/RichardShelby_front_neutral_level1_001.mat

Input pose, specifies the head pose and should be a .mat file.
max_gen_len
integer
1000
The maximum length (seconds) limitation for generating videos.
cfg_scale
number
1
The scale of classifier-free guidance. It can adjust the intensity of speaking styles.
num_inference_steps
integer
10

Min: 1

Max: 500

Number of denoising steps
crop_image
boolean
True
Enable cropping the input image. If your portrait is already cropped to 256x256, set this to False.

Output schema

The shape of the response you’ll get when you run this model with an API.

Schema
{'format': 'uri', 'title': 'Output', 'type': 'string'}