skytells-research / wav2lip

  • Public
  • 45.9K runs
  • L40S
Iterate in playground

Input

*file

video/image that contains faces to use

*file

video/audio file to use as raw audio source

string
Shift + Return to add a new line

Padding for the detected face bounding box. Please adjust to include chin at least Format: "top bottom left right"

Default: "0 10 0 0"

boolean

Smooth face detections over a short temporal window

Default: true

number

Can be specified only if input is a static image

Default: 25

integer

Output video height. Best results are obtained at 480 or 720

Default: 480

Output

No output yet! Press "Submit" to start a prediction.

Run time and cost

This model costs approximately $0.0068 to run on Replicate, or 147 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia L40S GPU hardware. Predictions typically complete within 7 seconds.

Readme

This model doesn't have a readme.