cjwbw / distil-whisper

Distilled version of Whisper

  • Public
  • 275 runs
  • L40S
  • GitHub
  • Paper
  • License
Iterate in playground

Input

Video Player is loading.
Current Time 00:00:000
Duration 00:00:000
Loaded: 0%
Stream Type LIVE
Remaining Time 00:00:000
 
1x
*file

Input audio file

string

Choose a model.

Default: "distil-whisper/distil-large-v2"

boolean

Enable chunked algorithm to transcribe long-form audio files.

Default: false

integer

Maximum number of new tokens to output.

Default: 128

Output

others will be discontinued and need to be replaced by new benchmark rates.
Generated in

This output was created using a different version of the model, cjwbw/distil-whisper:d843a073.

Run time and cost

This model costs approximately $0.067 to run on Replicate, or 14 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia L40S GPU hardware. Predictions typically complete within 69 seconds. The predict time for this model varies significantly based on the inputs.

Readme

Distil-Whisper

Distil-Whisper is a distilled version of Whisper that is 6 times faster, 49% smaller, and performs within 1% word error rate (WER) on out-of-distribution evaluation sets.

Model Params / M Rel. Latency Short-Form WER Long-Form WER
whisper-large-v2 1550 1.0 9.1 11.7
distil-large-v2 756 5.8 10.1 11.6
distil-medium.en 394 6.8 11.1 12.4

Acknowledgements

Citation

If you use this model, please consider citing the Distil-Whisper paper:

@misc{gandhi2023distilwhisper,
      title={Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling}, 
      author={Sanchit Gandhi and Patrick von Platen and Alexander M. Rush},
      year={2023},
      eprint={2311.00430},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

And also the Whisper paper:

@misc{radford2022robust,
      title={Robust Speech Recognition via Large-Scale Weak Supervision}, 
      author={Alec Radford and Jong Wook Kim and Tao Xu and Greg Brockman and Christine McLeavey and Ilya Sutskever},
      year={2022},
      eprint={2212.04356},
      archivePrefix={arXiv},
      primaryClass={eess.AS}
}