devxpy / cog-wav2lip

Faster version now available at https://gooey.ai/Lipsync/ & https://gooey.ai/lipsync-maker/

  • Public
  • 2.8M runs
  • A100 (80GB)
  • GitHub

Input

*file

video/image that contains faces to use

*file

video/audio file to use as raw audio source

string
Shift + Return to add a new line

Padding for the detected face bounding box. Please adjust to include chin at least Format: "top bottom left right"

Default: "0 10 0 0"

boolean

Smooth face detections over a short temporal window

Default: true

number

Can be specified only if input is a static image

Default: 25

integer

Reduce the resolution by this factor. Sometimes, best results are obtained at 480p or 720p

Default: 1

Output

No output yet! Press "Submit" to start a prediction.

Run time and cost

This model costs approximately $0.037 to run on Replicate, or 27 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia A100 (80GB) GPU hardware. Predictions typically complete within 27 seconds. The predict time for this model varies significantly based on the inputs.

Readme

Model description

Intended use

Ethical considerations

Caveats and recommendations