basord / lip-reading-ai-vsr

Lip Read silent videos with AI (Updated 7 months, 4 weeks ago)

  • Public
  • 1.8K runs
  • GitHub
  • License
Iterate in playground

Run time and cost

This model costs approximately $0.058 to run on Replicate, or 17 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia L40S GPU hardware. Predictions typically complete within 60 seconds. The predict time for this model varies significantly based on the inputs.

Readme

This AI Lip Reads videos. It allows to get transcript from any video without needing sound.

Works with UPLOADED VIDEOS ONLY. If you give the url of an online video it will not work. Only ONE PERSON FACE should be VISIBLE at the same time.

It supports MP4, MOV, MKV and WebM video files. But videos need to be ranging from 2 seconds to 40 seconds in length with a maximum resolution of 1080p.

For the best results, it’s better your videos meet the following criteria:

  • The speaker’s face should be well-lit and clearly visible
  • Both Frontal and profile view of the speaker’s face work. But for a profile view, half of the lips should be visible at minimum.
  • Ideally, record in good lighting conditions
  • Avoid videos where the speaker’s mouth is obscured (by masks, hands, or objects)
  • The closer the camera is to the speaker’s face, the better (while keeping the full face in frame)
  • Only one person face should be visible at the same time. This last criteria is mandatory. These conditions will significantly improve the accuracy of the transcription.