soykertje / whisper

Convert speech in audio to text

  • Public
  • 5.8K runs
  • GitHub
  • License



Run time and cost

This model runs on Nvidia T4 GPU hardware. Predictions typically complete within 7 minutes. The predict time for this model varies significantly based on the inputs.


Whisper is a general-purpose speech transcription model. It is trained on a large dataset of diverse audio and is also a multi-task model that can perform multilingual speech transcription as well as speech translation and language identification.

This version uses the lasts whisper version available and add a new input to perform the transcription.