Transcribe speech
Transcribe audio to text in multiple languages.
Our pick: Incredibly Fast Whisper
For most needs, use vaibhavs10/incredibly-fast-whisper. It really is fast (10x quicker than original Whisper), cheap, accurate, and supports tons of languages.
For speaker labels: WhisperX
Need to label speakers or get word-level timestamps? victor-upmeet/whisperx has you covered. Slightly more expensive than incredibly-fast-whisper but still very fast and useful.
Translation: SeamlessM4T
To translate speech between languages, cjwbw/seamless_communication is your friend.
This unified model enables multiple tasks without relying on multiple separate models:
- Speech-to-speech translation (S2ST)
- Speech-to-text translation (S2TT)
- Text-to-speech translation (T2ST)
- Text-to-text translation (T2TT)
- Automatic speech recognition (ASR)
Featured models
victor-upmeet / whisperx
Accelerated transcription, word-level timestamps and diarization with whisperX large-v3
vaibhavs10 / incredibly-fast-whisper
whisper-large-v3, incredibly fast, powered by Hugging Face Transformers! 🤗
cjwbw / seamless_communication
SeamlessM4T—Massively Multilingual & Multimodal Machine Translation
Recommended models
openai / whisper
Convert speech in audio to text
thomasmol / whisper-diarization
⚡️ Fast audio transcription | whisper large-v3 | speaker diarization | word & sentence level timestamps | prompt | hotwords
nvidia / parakeet-rnnt-1.1b
🗣️ Nvidia + Suno.ai's speech-to-text conversion with high accuracy and efficiency 📝
adidoes / whisperx-video-transcribe
ASR from video URL based on whisperx using large-v2 model