Distil-Whisper
Distil-Whisper is a distilled version of Whisper that is 6 times faster, 49% smaller, and performs within 1% word error rate (WER) on out-of-distribution evaluation sets.
Model | Params / M | Rel. Latency | Short-Form WER | Long-Form WER |
---|---|---|---|---|
whisper-large-v2 | 1550 | 1.0 | 9.1 | 11.7 |
distil-large-v2 | 756 | 5.8 | 10.1 | 11.6 |
distil-medium.en | 394 | 6.8 | 11.1 | 12.4 |
Acknowledgements
- OpenAI for the Whisper model and original codebase
- Hugging Face 🤗 Transformers for the model integration
- Google’s TPU Research Cloud (TRC) programme for Cloud TPU v4s
Citation
If you use this model, please consider citing the Distil-Whisper paper:
@misc{gandhi2023distilwhisper,
title={Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling},
author={Sanchit Gandhi and Patrick von Platen and Alexander M. Rush},
year={2023},
eprint={2311.00430},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
And also the Whisper paper:
@misc{radford2022robust,
title={Robust Speech Recognition via Large-Scale Weak Supervision},
author={Alec Radford and Jong Wook Kim and Tao Xu and Greg Brockman and Christine McLeavey and Ilya Sutskever},
year={2022},
eprint={2212.04356},
archivePrefix={arXiv},
primaryClass={eess.AS}
}