You're looking at a specific version of this model. Jump to the model overview.

cjwbw /dreamtalk:1716d488

Input

*file

Input image. This specifies the input portrait. The resolution should be larger than 256x256 and will be cropped to 256x256.

*file

Input audio file. The input audio file extensions should be wav, mp3, m4a, and mp4 (video with sound) should all be compatible.

string

Input style_clip_mat, optional. This specifies the reference speaking style.

Default: "data/style_clip/3DMM/M030_front_neutral_level1_001.mat"

string

Input pose, specifies the head pose and should be a .mat file.

Default: "data/pose/RichardShelby_front_neutral_level1_001.mat"

integer

The maximum length (seconds) limitation for generating videos.

Default: 1000

number

The scale of classifier-free guidance. It can adjust the intensity of speaking styles.

Default: 1

integer
(minimum: 1, maximum: 500)

Number of denoising steps

Default: 10

boolean

Enable cropping the input image. If your portrait is already cropped to 256x256, set this to False.

Default: true

Output

No output yet! Press "Submit" to start a prediction.