You're looking at a specific version of this model. Jump to the model overview.
villesau /whisper-timestamped:05f39529
Input schema
The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.
| Field | Type | Default value | Description |
|---|---|---|---|
| audio_file |
string
|
Audio file to transcribe
|
|
| language |
string
|
auto
|
Language code (e.g., 'en') or 'auto' for auto-detect
|
| task |
None
|
transcribe
|
Task to perform
|
| vad |
boolean
|
False
|
Use Voice Activity Detection
|
| detect_disfluencies |
boolean
|
False
|
Detect speech disfluencies
|
| compute_word_confidence |
boolean
|
True
|
Compute word confidence scores
|
| temperature |
number
|
0
|
Temperature for sampling
|
| best_of |
integer
|
Number of candidates when sampling with non-zero temperature
|
|
| beam_size |
integer
|
Number of beams in beam search, only applicable when temperature is zero
|
|
| patience |
number
|
Optional patience value to use in beam decoding
|
|
| length_penalty |
number
|
Optional token length penalty coefficient (alpha) as in https://arxiv.org/abs/1609.08144
|
|
| suppress_tokens |
string
|
-1
|
Comma-separated list of token ids to suppress during sampling
|
| initial_prompt |
string
|
Optional text to provide as a prompt for the first window
|
|
| condition_on_previous_text |
boolean
|
True
|
Whether to condition on previous text
|
| no_speech_threshold |
number
|
0.6
|
Threshold for no speech probability
|
| compression_ratio_threshold |
number
|
2.4
|
Threshold for compression ratio
|
| logprob_threshold |
number
|
-1
|
Threshold for average log probability
|
| verbose |
boolean
|
False
|
Whether to display the text being decoded
|
Output schema
The shape of the response you’ll get when you run this model with an API.
Schema
{'title': 'Output', 'type': 'object'}