Readme
Whisper Diarization
Audio transcribing + diarization pipeline.
⚡️ Super Fast Transcription and Diarization: 2 hour audio in 3 mins
Models used
- Whisper Large v3 (CTranslate 2 version
faster-whisper==1.0.3) - Pyannote audio 3.3.1
Usage
- Used at Audiogest
- Or try at Replicate
- Or deploy yourself at Replicate (Make sure to add your own HuggingFace API key and accept the terms of use of the pyannote models used)
Input
file_string: str: Either provide a Base64 encoded audio file.file_url: str: Or provide a direct audio file URL.file: Path: Or provide an audio file.group_segments: bool: Group segments of the same speaker shorter than 2 seconds apart. Default isTrue.num_speakers: int: Number of speakers. Leave empty to autodetect. Must be between 1 and 50.translate: bool: Translate the speech into English.language: str: Language of the spoken words as a language code like ‘en’. Leave empty to auto detect language.prompt: str: Vocabulary: provide names, acronyms, and loanwords in a list. Use punctuation for best accuracy. Also now used as ‘hotwords’ paramater in transcribing,offset_seconds: int: Offset in seconds, used for chunked inputs. Default is 0.transcript_output_format: str: Specify the format of the transcript output: individual words with timestamps, full text of segments, or a combination of both.- Default is
both. - Options are
words_only,segments_only,both,
Output
segments: List[Dict]: List of segments with speaker, start and end time.- Includes
avg_logprobfor each segment andprobabilityfor each word level segment. num_speakers: int: Number of speakers (detected, unless specified in input).language: str: Language of the spoken words as a language code like ‘en’ (detected, unless specified in input).
Thanks to
Model created