basord/lip-reading-ai-vsr | Run with an API on Replicate

Readme

This AI Lip Reads videos. It allows to get transcript from any video without needing sound.

Works with UPLOADED VIDEOS ONLY. If you give the url of an online video it will not work. Only ONE PERSON FACE should be VISIBLE at the same time.

It supports MP4, MOV, MKV and WebM video files. But videos need to be ranging from 2 seconds to 40 seconds in length with a maximum resolution of 1080p.

For the best results, it’s better your videos meet the following criteria:

The speaker’s face should be well-lit and clearly visible
Both Frontal and profile view of the speaker’s face work. But for a profile view, half of the lips should be visible at minimum.
Ideally, record in good lighting conditions
Avoid videos where the speaker’s mouth is obscured (by masks, hands, or objects)
The closer the camera is to the speaker’s face, the better (while keeping the full face in frame)
Only one person face should be visible at the same time. This last criteria is mandatory. These conditions will significantly improve the accuracy of the transcription.

Full tutorial on how to use Lip Reading AI on Replicate here

Model created over 1 year ago

Run time and cost

Readme