Transcribe audio to text in multiple languages.
For most needs, use vaibhavs10/incredibly-fast-whisper. It really is fast (10x quicker than original Whisper), cheap, accurate, and supports tons of languages.
Need to label speakers or get word-level timestamps? victor-upmeet/whisperx has you covered. Slightly more expensive than incredibly-fast-whisper but still very fast and useful.
You can also check out our Speaker Diarization collection for models that can identify speakers from audio and video.
To translate speech between languages, cjwbw/seamless_communication is your friend.
This unified model enables multiple tasks without relying on multiple separate models:
Featured models


openai/gpt-4o-transcribe
A speech-to-text model that uses GPT-4o to transcribe audio
Updated 1Â week, 2Â days ago
31.2K runs


victor-upmeet/whisperx
Accelerated transcription, word-level timestamps and diarization with whisperX large-v3
Updated 1Â year, 2Â months ago
4.9M runs


vaibhavs10/incredibly-fast-whisper
whisper-large-v3, incredibly fast, powered by Hugging Face Transformers! 🤗
Updated 1Â year, 9Â months ago
19.6M runs
Recommended Models
If speed is your top priority, vaibhavs10/incredibly-fast-whisper and openai/gpt-4o-transcribe are among the fastest models in the speech-to-text collection. They’re designed for low-latency transcription, which makes them ideal for live or near real-time scenarios like voice notes, quick interviews, or interactive applications.
Keep in mind that faster models may not include advanced features like speaker labeling or word-level timestamps.
openai/whisper is a reliable general-purpose option that works well with clean audio and single-speaker recordings. It offers multilingual support and solid accuracy for most everyday transcription needs.
If you need more structure—like timestamps or speaker labels—victor-upmeet/whisperx adds those capabilities without a massive jump in runtime.
For clear recordings like lectures, podcasts, or voice memos, vaibhavs10/incredibly-fast-whisper or openai/whisper are great choices. They deliver accurate transcripts quickly and handle common accents well.
If your audio includes multiple speakers—like team meetings, interviews, or panel discussions—victor-upmeet/whisperx is your best bet. It adds speaker diarization and word-level timestamps so you can keep track of who said what.
If you need transcription in multiple languages or want translations built in, cjwbw/seamless_communication is a strong option. It supports multiple languages and can handle more complex audio scenarios like mixed-language conversations.
Most models produce plain text transcripts. Some also include:
You can package your own model with Cog and push it to Replicate. This lets you control how it’s run, updated, and shared, whether you’re adapting an open-source model or deploying a fine-tuned one.
Many models in the speech-to-text collection allow commercial use, but licenses vary. Some models have conditions or attribution requirements, so always check the model page before using transcripts in commercial projects.
Recommended Models


openai/gpt-4o-mini-transcribe
A speech-to-text model that uses GPT-4o mini to transcribe audio
Updated 1Â week, 2Â days ago
6.4K runs


thomasmol/whisper-diarization
⚡️ Blazing fast audio transcription with speaker diarization | Whisper Large V3 Turbo | word & sentence level timestamps | prompt
Updated 8Â months, 4Â weeks ago
3.4M runs


openai/whisper
Convert speech in audio to text
Updated 11Â months, 2Â weeks ago
142.4M runs

nvidia/parakeet-rnnt-1.1b
🗣️ Nvidia + Suno.ai's speech-to-text conversion with high accuracy and efficiency 📝
Updated 1Â year, 10Â months ago
18.9K runs


adidoes/whisperx-video-transcribe
ASR from video URL based on whisperx using large-v2 model
Updated 2Â years, 1Â month ago
19.6K runs


cjwbw/seamless_communication
SeamlessM4T—Massively Multilingual & Multimodal Machine Translation
Updated 2Â years, 2Â months ago
90K runs


daanelson/whisperx
Accelerated transcription of audio using WhisperX
Updated 2Â years, 4Â months ago
90.9K runs


m1guelpf/whisper-subtitles
Generate subtitles from an audio file, using OpenAI's Whisper model.
Updated 3Â years, 1Â month ago
73.9K runs