You're looking at a specific version of this model. Jump to the model overview.

zsxkib /v-express:e0122658

Input

file

Path to the reference image that will be used as the base for the generated video.

file

Path to the audio file that will be used to drive the motion in the generated video.

boolean

If True and driving_video is provided, use the audio from the driving video instead of the driving_audio.

Default: false

file

Path to the video file that will be used to extract the head motion. If not provided, the generated video will use the motion based on the selected motion_mode.

string

Mode for generating the head motion in the output video.

Default: "fast"

number
(minimum: 0, maximum: 1)

Amount of attention to pay to the reference image vs. the driving motion. Higher values will make the generated video adhere more closely to the reference image. Range: 0.0 to 1.0

Default: 0.95

number
(minimum: 0, maximum: 10)

Amount of attention to pay to the driving audio vs. the reference image. Higher values will make the generated video's motion more closely match the driving audio. Range: 0.0 to 10.0

Default: 3

integer
(minimum: 1, maximum: 100)

Number of diffusion steps to perform during generation. More steps will generally produce better quality results but will take longer to run. Range: 1 to 100

Default: 25

integer
(minimum: 64, maximum: 2048)

Width of the generated video frames.

Default: 512

integer
(minimum: 64, maximum: 2048)

Height of the generated video frames.

Default: 512

number
(minimum: 1, maximum: 60)

Frame rate of the generated video.

Default: 30

number
(minimum: 1, maximum: 20)

Guidance scale for the diffusion model. Higher values will result in the generated video following the driving motion and audio more closely.

Default: 3.5

integer
(minimum: 1, maximum: 24)

Number of context frames to use for motion estimation.

Default: 12

integer
(minimum: 1, maximum: 10)

Stride of the context frames.

Default: 1

integer
(minimum: 0, maximum: 24)

Number of overlapping frames between context windows.

Default: 4

integer
(minimum: 0, maximum: 10)

Number of audio frames to pad on each side of the driving audio.

Default: 2

integer

Random seed. Leave blank to randomize the seed

Output

No output yet! Press "Submit" to start a prediction.