You're looking at a specific version of this model. Jump to the model overview.

romanfurman6 /whisperx-multi-chunk:3762efca

Input schema

The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.

Field Type Default value Description
audio_urls
array
Array of public audio urls to process
total_duration_seconds
number
Total duration of the complete audio in seconds
chunk_size_seconds
number
Duration of each chunk in seconds (used for timestamp calculation). Latest chunk can be shorter, it will be calculated based on the total duration and the number of chunks.
language
string
ISO code of the language spoken in the audio, specify None to perform language detection
language_detection_min_prob
number
0.7
Minimum probability for recursive language detection
language_detection_max_tries
integer
5
Maximum retries for recursive language detection
initial_prompt
string
Optional text prompt for the first window
batch_size
integer
32
Parallelization of input audio transcription
temperature
number
0.2
Temperature to use for sampling
vad_onset
number
0.5
VAD onset threshold
vad_offset
number
0.363
VAD offset threshold
align_output
boolean
False
Whether to align output for word-level timestamps
diarization
boolean
False
Whether to perform diarization
huggingface_access_token
string
HuggingFace token for diarization
min_speakers
integer
Minimum number of speakers if diarization is activated
max_speakers
integer
Maximum number of speakers if diarization is activated
debug
boolean
True
Print debug information

Output schema

The shape of the response you’ll get when you run this model with an API.

Schema
{'properties': {'detected_language': {'title': 'Detected Language',
                                      'type': 'string'},
                'processing_time': {'title': 'Processing Time',
                                    'type': 'number'},
                'segments': {'title': 'Segments'},
                'total_chunks': {'title': 'Total Chunks', 'type': 'integer'}},
 'required': ['segments',
              'detected_language',
              'total_chunks',
              'processing_time'],
 'title': 'Output',
 'type': 'object'}