Transcribe speech
Transcribe audio to text in multiple languages.
Our pick: Incredibly Fast Whisper
For most needs, use vaibhavs10/incredibly-fast-whisper. It really is fast (10x quicker than original Whisper), cheap, accurate, and supports tons of languages.
For speaker labels: WhisperX
Need to label speakers or get word-level timestamps? victor-upmeet/whisperx has you covered. Slightly more expensive than incredibly-fast-whisper but still very fast and useful.
Translation: SeamlessM4T
To translate speech between languages, cjwbw/seamless_communication is your friend.
This unified model enables multiple tasks without relying on multiple separate models:
- Speech-to-speech translation (S2ST)
- Speech-to-text translation (S2TT)
- Text-to-speech translation (T2ST)
- Text-to-text translation (T2TT)
- Automatic speech recognition (ASR)
Featured models

victor-upmeet / whisperx
Accelerated transcription, word-level timestamps and diarization with whisperX large-v3

vaibhavs10 / incredibly-fast-whisper
whisper-large-v3, incredibly fast, powered by Hugging Face Transformers! 🤗

cjwbw / seamless_​communication
SeamlessM4T—Massively Multilingual & Multimodal Machine Translation
Recommended models

thomasmol / whisper-diarization
⚡️ Blazing fast audio transcription with speaker diarization | Whisper Large V3 Turbo | word & sentence level timestamps | prompt

openai / whisper
Convert speech in audio to text

nvidia / parakeet-rnnt-1.1b
🗣️ Nvidia + Suno.ai's speech-to-text conversion with high accuracy and efficiency 📝

adidoes / whisperx-video-transcribe
ASR from video URL based on whisperx using large-v2 model

m1guelpf / whisper-subtitles
Generate subtitles from an audio file, using OpenAI's Whisper model.