openai / whisper

Convert speech in audio to text

  • Public
  • 74.7M runs
  • T4
  • GitHub
  • Weights
  • Paper
  • License

Run openai/whisper with an API

Input schema

audiouri

Audio file

languagestring

Language spoken in the audio, specify 'auto' for automatic language detection

Default
"auto"
patiencenumber

optional patience value to use in beam decoding, as in https://arxiv.org/abs/2204.05424, the default (1.0) is equivalent to conventional beam search

translateboolean

Translate the text to English when set to True

temperaturenumber

temperature to use for sampling

transcriptionstring

Choose the format for the transcription

Default
"plain text"
initial_promptstring

optional text to provide as a prompt for the first window.

suppress_tokensstring

comma-separated list of token ids to suppress during sampling; '-1' will suppress most special characters except common punctuations

Default
"-1"
logprob_thresholdnumber

if the average log probability is lower than this value, treat the decoding as failed

Default
-1
no_speech_thresholdnumber

if the probability of the <|nospeech|> token is higher than this value AND the decoding has failed due to `logprob_threshold`, consider the segment as silence

Default
0.6
condition_on_previous_textboolean

if True, provide the previous output of the model as a prompt for the next window; disabling may make the text inconsistent across windows, but the model becomes less prone to getting stuck in a failure loop

Default
true
compression_ratio_thresholdnumber

if the gzip compression ratio is higher than this value, treat the decoding as failed

Default
2.4
temperature_increment_on_fallbacknumber

temperature to increase when falling back when the decoding fails to meet either of the thresholds below

Default
0.2

Output schema

segmentsunknown
srt_fileuri
txt_fileuri
translationstring
transcriptionstring
detected_languagestring