You're looking at a specific version of this model. Jump to the model overview.

wordscenes /whisper-stable-ts:b81c9bb7

Input schema

The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.

Field Type Default value Description
audio_path
string
Audio to transcribe or align
mode
None
transcribe
Mode: 'transcribe' to generate transcript, 'align' to align provided text
text
string
Text to align with audio (required when mode='align')
language
string
en
Language to transcribe
denoiser
None
The denoiser to use (transcribe mode only).
vad
boolean
True
Whether to use Silero VAD to generate timestamp suppression mask.
beam_size
integer
5
Number of beams in beam search, only applicable when temperature is zero (transcribe mode only).
best_of
integer
5
Number of candidates when sampling with non-zero temperature (transcribe mode only).
regroup
boolean
True
Whether to regroup all words into segments with more natural boundaries.
initial_prompt
string
Text to provide as a prompt for the first window (transcribe mode only).

Output schema

The shape of the response you’ll get when you run this model with an API.

Schema
{'title': 'Output', 'type': 'string'}