CrisperWhisper

CrisperWhisper is an advanced variant of OpenAI’s Whisper, designed for fast, precise, and verbatim speech recognition with accurate (crisp) word-level timestamps. Unlike the original Whisper, which tends to omit disfluencies and follows more of a intended transcription style, CrisperWhisper aims to transcribe every spoken word exactly as it is, including fillers, pauses, stutters and false starts.

Key Features

🎯 Accurate Word-Level Timestamps: Provides precise timestamps, even around disfluencies and pauses, by utilizing an adjusted tokenizer and a custom attention loss during training.
📝 Verbatim Transcription: Transcribes every spoken word exactly as it is, including and differentiating fillers like “um” and “uh”.
🔍 Filler Detection: Detects and accurately transcribes fillers.
🛡️ Hallucination Mitigation: Minimizes transcription hallucinations to enhance accuracy.

Model created over 1 year ago