You're looking at a specific version of this model. Jump to the model overview.

zsxkib /sonic:a2aad29e

Input

*file

Input portrait image (will be cropped if face is detected).

*file

Input audio file (WAV, MP3, etc.) for the voice.

number
(minimum: 0.5, maximum: 2)

Controls movement intensity. Increase/decrease for more/less movement.

Default: 1

integer
(minimum: 256, maximum: 1024)

Minimum image resolution for processing. Lower values use less memory but may reduce quality.

Default: 512

integer
(minimum: 5, maximum: 50)

Number of diffusion steps. Higher values may improve quality but take longer.

Default: 25

boolean

If true, output video matches the original image resolution. Otherwise uses the min_resolution after cropping.

Default: false

integer

Random seed for reproducible results. Leave blank for a random seed.

Output

No output yet! Press "Submit" to start a prediction.