You're looking at a specific version of this model. Jump to the model overview.
villesau /whisper-timestamped:d4417fc3
Input schema
The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.
Field | Type | Default value | Description |
---|---|---|---|
audio_file |
string
|
Audio file to transcribe
|
|
language |
string
|
auto
|
Language code (e.g., 'en') or 'auto' for auto-detect
|
task |
string
(enum)
|
transcribe
Options: transcribe, translate |
Task to perform
|
vad |
boolean
|
False
|
Use Voice Activity Detection
|
detect_disfluencies |
boolean
|
False
|
Detect speech disfluencies
|
compute_word_confidence |
boolean
|
True
|
Compute word confidence scores
|
temperature |
number
|
0
|
Temperature for sampling
|
best_of |
integer
|
Number of candidates when sampling with non-zero temperature
|
|
beam_size |
integer
|
Number of beams in beam search, only applicable when temperature is zero
|
|
patience |
number
|
Optional patience value to use in beam decoding
|
|
length_penalty |
number
|
Optional token length penalty coefficient (alpha) as in https://arxiv.org/abs/1609.08144
|
|
suppress_tokens |
string
|
-1
|
Comma-separated list of token ids to suppress during sampling
|
initial_prompt |
string
|
Optional text to provide as a prompt for the first window
|
|
condition_on_previous_text |
boolean
|
True
|
Whether to condition on previous text
|
no_speech_threshold |
number
|
0.6
|
Threshold for no speech probability
|
compression_ratio_threshold |
number
|
2.4
|
Threshold for compression ratio
|
logprob_threshold |
number
|
-1
|
Threshold for average log probability
|
verbose |
boolean
|
False
|
Whether to display the text being decoded
|
Output schema
The shape of the response you’ll get when you run this model with an API.
Schema
{'title': 'Output', 'type': 'object'}