CrisperWhisper
CrisperWhisper is an advanced variant of OpenAI’s Whisper, designed for fast, precise, and verbatim speech recognition with accurate (crisp) word-level timestamps. Unlike the original Whisper, which tends to omit disfluencies and follows more of a intended transcription style, CrisperWhisper aims to transcribe every spoken word exactly as it is, including fillers, pauses, stutters and false starts.
Key Features
- 🎯 Accurate Word-Level Timestamps: Provides precise timestamps, even around disfluencies and pauses, by utilizing an adjusted tokenizer and a custom attention loss during training.
- 📝 Verbatim Transcription: Transcribes every spoken word exactly as it is, including and differentiating fillers like “um” and “uh”.
- 🔍 Filler Detection: Detects and accurately transcribes fillers.
- 🛡️ Hallucination Mitigation: Minimizes transcription hallucinations to enhance accuracy.