cjwbw/dreamtalk:a22ed728 | Run with an API on Replicate

You're looking at a specific version of this model. Jump to the model overview.

cjwbw /dreamtalk:a22ed728

Input schema

The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.

Field	Type	Default value	Description
image	string		Input image. This specifies the input portrait. The resolution should be larger than 256x256 and will be cropped to 256x256.
audio	string		Input audio file. The input audio file extensions should be wav, mp3, m4a, and mp4 (video with sound) should all be compatible.
style_clip	string (enum)	data/style_clip/3DMM/M030_front_neutral_level1_001.mat Options: data/style_clip/3DMM/M030_front_happy_level3_001.mat, data/style_clip/3DMM/M030_front_contempt_level3_001.mat, data/style_clip/3DMM/W011_front_surprised_level3_001.mat, data/style_clip/3DMM/W009_front_angry_level3_001.mat, data/style_clip/3DMM/M030_front_disgusted_level3_001.mat, data/style_clip/3DMM/W009_front_fear_level3_001.mat, data/style_clip/3DMM/W011_front_neutral_level1_001.mat, data/style_clip/3DMM/M030_front_fear_level3_001.mat, data/style_clip/3DMM/W011_front_angry_level3_001.mat, data/style_clip/3DMM/M030_front_sad_level3_001.mat, data/style_clip/3DMM/W009_front_sad_level3_001.mat, data/style_clip/3DMM/W011_front_sad_level3_001.mat, data/style_clip/3DMM/M030_front_neutral_level1_001.mat, data/style_clip/3DMM/W011_front_disgusted_level3_001.mat, data/style_clip/3DMM/W009_front_contempt_level3_001.mat, data/style_clip/3DMM/W009_front_happy_level3_001.mat, data/style_clip/3DMM/W011_front_contempt_level3_001.mat, data/style_clip/3DMM/M030_front_angry_level3_001.mat, data/style_clip/3DMM/W009_front_surprised_level3_001.mat, data/style_clip/3DMM/W011_front_fear_level3_001.mat, data/style_clip/3DMM/W009_front_neutral_level1_001.mat, data/style_clip/3DMM/W011_front_happy_level3_001.mat, data/style_clip/3DMM/W009_front_disgusted_level3_001.mat, data/style_clip/3DMM/M030_front_surprised_level3_001.mat	Input style_clip_mat, optional. This specifies the reference speaking style.
pose	string (enum)	data/pose/RichardShelby_front_neutral_level1_001.mat Options: data/pose/RichardShelby_front_neutral_level1_001.mat	Input pose, specifies the head pose and should be a .mat file.
max_gen_len	integer	1000	The maximum length (seconds) limitation for generating videos.
cfg_scale	number	1	The scale of classifier-free guidance. It can adjust the intensity of speaking styles.
num_inference_steps	integer	10 Min: 1 Max: 500	Number of denoising steps
crop_image	boolean	True	Enable cropping the input image. If your portrait is already cropped to 256x256, set this to False.

Output schema

The shape of the response you’ll get when you run this model with an API.

Schema

{'format': 'uri', 'title': 'Output', 'type': 'string'}