Whisper-Timestamped Transcription Model (Large V3)
Overview
This model provides speech recognition with word-level timestamps using the whisper-timestamped library and Whisper Large V3. It’s designed for transcribing audio files, offering precise timing information for each transcribed word.
Features
- Uses Whisper Large V3 for state-of-the-art speech recognition
- Efficient and accurate word-level timestamps
- Voice Activity Detection (VAD) to improve transcription accuracy
- Confidence scores for each word
- Detection of speech disfluencies
- Support for multiple languages
- Options for transcription or translation to English
Usage
To use this model, provide an audio file. The model will process the audio and return a JSON object containing the transcription with detailed timing information for segments and individual words.
For detailed information on input parameters and output format, please refer to the model’s input/output specifications on this page.
About
This model is hosted on Replicate and uses the whisper-timestamped library with Whisper Large V3, an extension of OpenAI’s Whisper model. For more information about whisper-timestamped, visit the GitHub repository.