maxdudik/whisperx:10993520 – Run with an API on Replicate

You're looking at a specific version of this model. Jump to the model overview.

maxdudik /whisperx:10993520

Input schema

The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.

Field	Type	Default value	Description
audio_file	string		Audio file
language	string		ISO code of the language spoken in the audio, specify None to perform language detection
language_detection_min_prob	number	0	If language is not specified, then the language will be detected recursively on different parts of the file until it reaches the given probability
language_detection_max_tries	integer	5	If language is not specified, then the language will be detected following the logic of language_detection_min_prob parameter, but will stop after the given max retries. If max retries is reached, the most probable language is kept.
initial_prompt	string		Optional text to provide as a prompt for the first window
batch_size	integer	64	Parallelization of input audio transcription
temperature	number	0	Temperature to use for sampling
vad_onset	number	0.5	VAD onset
vad_offset	number	0.363	VAD offset
align_output	boolean	False	Aligns whisper output to get accurate word-level timestamps
diarization	boolean	False	Assign speaker ID labels
huggingface_access_token	string		To enable diarization, please enter your HuggingFace token (read). You need to accept the user agreement for the models specified in the README.
min_speakers	integer		Minimum number of speakers if diarization is activated (leave blank if unknown)
max_speakers	integer		Maximum number of speakers if diarization is activated (leave blank if unknown)
group_segments	boolean	True	Group segments of same speaker shorter apart than 2 seconds
debug	boolean	False	Print out compute/inference times and memory usage information

Output schema

The shape of the response you’ll get when you run this model with an API.

Schema

{'properties': {'detected_language': {'title': 'Detected Language',
                                      'type': 'string'},
                'segments': {'title': 'Segments'}},
 'required': ['detected_language'],
 'title': 'Output',
 'type': 'object'}