You're looking at a specific version of this model. Jump to the model overview.
wordscenes /whisper-stable-ts:bd1370b4
Input schema
The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.
| Field | Type | Default value | Description |
|---|---|---|---|
| audio_path |
string
|
Audio to transcribe or align
|
|
| mode |
None
|
transcribe
|
Mode: 'transcribe' to generate transcript, 'align' to align provided text
|
| text |
string
|
|
Text to align with audio (required when mode='align')
|
| language |
string
|
en
|
Language to transcribe
|
| denoiser |
None
|
none
|
The denoiser to use (transcribe mode only).
|
| vad |
boolean
|
True
|
Whether to use Silero VAD to generate timestamp suppression mask.
|
| beam_size |
integer
|
5
|
Number of beams in beam search, only applicable when temperature is zero (transcribe mode only).
|
| best_of |
integer
|
5
|
Number of candidates when sampling with non-zero temperature (transcribe mode only).
|
| regroup |
boolean
|
True
|
Whether to regroup all words into segments with more natural boundaries.
|
| initial_prompt |
string
|
Text to provide as a prompt for the first window (transcribe mode only).
|
Output schema
The shape of the response you’ll get when you run this model with an API.
Schema
{'title': 'Output', 'type': 'string'}