You're looking at a specific version of this model. Jump to the model overview.
victor-upmeet /whisperx:449341d3
Input schema
The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.
Field | Type | Default value | Description |
---|---|---|---|
audio_file |
string
|
Audio file
|
|
language |
string
|
ISO code of the language spoken in the audio, specify None to perform language detection
|
|
initial_prompt |
string
|
Optional text to provide as a prompt for the first window
|
|
batch_size |
integer
|
64
|
Parallelization of input audio transcription
|
temperature |
number
|
0
|
Temperature to use for sampling
|
vad_onset |
number
|
0.5
|
VAD onset
|
vad_offset |
number
|
0.363
|
VAD offset
|
align_output |
boolean
|
False
|
Aligns whisper output to get accurate word-level timestamps
|
diarization |
boolean
|
False
|
Assign speaker ID labels
|
huggingface_access_token |
string
|
To enable diarization, please enter your HuggingFace token (read). You need to accept the user agreement for the models specified in the README.
|
|
min_speakers |
integer
|
Minimum number of speakers if diarization is activated (leave blank if unknown)
|
|
max_speakers |
integer
|
Maximum number of speakers if diarization is activated (leave blank if unknown)
|
|
debug |
boolean
|
False
|
Print out compute/inference times and memory usage information
|
Output schema
The shape of the response you’ll get when you run this model with an API.
Schema
{'properties': {'detected_language': {'title': 'Detected Language',
'type': 'string'},
'segments': {'title': 'Segments'}},
'required': ['detected_language'],
'title': 'ModelOutput',
'type': 'object'}