Collections

Transcribe speech

Transcribe audio to text in multiple languages.

Our pick: Incredibly Fast Whisper

For most needs, use vaibhavs10/incredibly-fast-whisper. It really is fast (10x quicker than original Whisper), cheap, accurate, and supports tons of languages.

For speaker labels: WhisperX

Need to label speakers or get word-level timestamps? victor-upmeet/whisperx has you covered. Slightly more expensive than incredibly-fast-whisper but still very fast and useful.

Translation: SeamlessM4T

To translate speech between languages, cjwbw/seamless_communication is your friend.

This unified model enables multiple tasks without relying on multiple separate models:

  • Speech-to-speech translation (S2ST)
  • Speech-to-text translation (S2TT)
  • Text-to-speech translation (T2ST)
  • Text-to-text translation (T2TT)
  • Automatic speech recognition (ASR)