Transcribe any audio file with speaker diarization
Create transcriptions with speaker labels and timestamps (diarization) easily with this model. Uses whisper and pyannote under the hood. Will continue improving performance + accuracy.
How to use 🪄
- input a file as a base64 string or a file url (must be direct and public link to file)
- input filename with file extension
- provide number of speakers
- give prompt to improve accuracy of transcript
- other inputs are used if you provide chunks of files
- hit submit and wait!
Need a easier interface to use this model?
Head over to 🎙️ Audiogest, which is a webapp I made that uses this model. On the app you can upload any audio file and get the transcription produced by this model and generate useful summaries!
No file urls or base64 strings needed!
Or support me here Buy me a coffee And support these fantastic developers and researchers 🙏:
- https://github.com/guillaumekln/faster-whisper
- https://github.com/m-bain/whisperx
- https://huggingface.co/speechbrain/spkrec-ecapa-voxceleb
Model description
Uses faster-whisper for transcribing and pyannote speaker embedding "speechbrain/spkrec-ecapa-voxceleb" model for speaker diarization.
Input is the base64 string of an audio file or a file url.
Intended use
Easily transcribe and get speaker labels from any audio format.
Ethical considerations
🤷 Same as any AI model. Your input is not used for fine-tuning.
Caveats and recommendations
Takes long, also for short audio clips, because of possible cold boot.
Recently improved by 4x by using faster-whisper.
Diarization is not perfect