basord / lip-reading-ai-vsr

Lip Read silent videos with AI

  • Public
  • 87 runs
  • L40S
  • GitHub
  • License

Input

*file

Video file to transcribe

Output

{"status": "success", "transcript": "HEARS PEOPLE WHO ARE TAKING TIME OUT OF THEIR LIFE JUST COME DOWN AND ACTUALLY PRODUCE THINGS AND SO IT WAS A REALLY COOL ENVIRONMENT"}
Generated in

Run time and cost

This model runs on Nvidia L40S GPU hardware. We don't yet have enough runs of this model to provide performance information.

Readme

This AI Lip Reads videos. It allows to get transcript from any video without needing sound.

Works with UPLOADED VIDEOS ONLY. If you give the url of an online video it will not work. Only ONE PERSON FACE should be VISIBLE at the same time.

It supports MP4, MOV, MKV and WebM video files. But videos need to be ranging from 2 seconds to 40 seconds in length with a maximum resolution of 1080p.

For the best results, it’s better your videos meet the following criteria:

  • The speaker’s face should be well-lit and clearly visible
  • Both Frontal and profile view of the speaker’s face work. But for a profile view, half of the lips should be visible at minimum.
  • Ideally, record in good lighting conditions
  • Avoid videos where the speaker’s mouth is obscured (by masks, hands, or objects)
  • The closer the camera is to the speaker’s face, the better (while keeping the full face in frame)
  • Only one person face should be visible at the same time. This last criteria is mandatory. These conditions will significantly improve the accuracy of the transcription.