You're looking at a specific version of this model. Jump to the model overview.
dashed /whisperx-subtitles-replicate:d61d6a06
Input schema
The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.
Field | Type | Default value | Description |
---|---|---|---|
audio_file |
string
|
Audio file
|
|
language |
string
|
ISO code of the language spoken in the audio, specify None to perform language detection
|
|
language_detection_min_prob |
number
|
0
|
If language is not specified, then the language will be detected recursively on different parts of the file until it reaches the given probability
|
language_detection_max_tries |
integer
|
5
|
If language is not specified, then the language will be detected following the logic of language_detection_min_prob parameter, but will stop after the given max retries. If max retries is reached, the most probable language is kept.
|
initial_prompt |
string
|
Optional text to provide as a prompt for the first window
|
|
batch_size |
integer
|
64
|
Parallelization of input audio transcription
|
temperature |
number
|
0
|
Temperature to use for sampling
|
vad_onset |
number
|
0.5
|
VAD onset
|
vad_offset |
number
|
0.363
|
VAD offset
|
align_output |
boolean
|
True
|
Aligns whisper output to get accurate word-level timestamps
|
diarization |
boolean
|
False
|
Assign speaker ID labels
|
huggingface_access_token |
string
|
To enable diarization, please enter your HuggingFace token (read). You need to accept the user agreement for the models specified in the README.
|
|
min_speakers |
integer
|
Minimum number of speakers if diarization is activated (leave blank if unknown)
|
|
max_speakers |
integer
|
Maximum number of speakers if diarization is activated (leave blank if unknown)
|
|
debug |
boolean
|
False
|
Print out compute/inference times and memory usage information
|
Output schema
The shape of the response you’ll get when you run this model with an API.
Schema
{'properties': {'detected_language': {'title': 'Detected Language',
'type': 'string'},
'segments': {'title': 'Segments'},
'srt_file': {'format': 'uri',
'title': 'Srt File',
'type': 'string'},
'srt_output': {'title': 'Srt Output', 'type': 'string'}},
'required': ['detected_language', 'srt_output', 'srt_file'],
'title': 'Output',
'type': 'object'}