You're looking at a specific version of this model. Jump to the model overview.

suminhthanh /whisperx-custom:4b17e12b

Input schema

The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.

Field Type Default value Description
audio_file
string
Audio file
url
string
URL of the audio file (if audio_file is not provided)
language
string
vi
ISO code of the language spoken in the audio, specify None to perform language detection
language_detection_min_prob
number
0
If language is not specified, then the language will be detected recursively on different parts of the file until it reaches the given probability
language_detection_max_tries
integer
5
If language is not specified, then the language will be detected following the logic of language_detection_min_prob parameter, but will stop after the given max retries. If max retries is reached, the most probable language is kept.
initial_prompt
string
Optional text to provide as a prompt for the first window
batch_size
integer
64
Parallelization of input audio transcription
temperature
number
0
Temperature to use for sampling
vad_onset
number
0.5
VAD onset
vad_offset
number
0.363
VAD offset
align_output
boolean
False
Aligns whisper output to get accurate word-level timestamps
diarization
boolean
False
Assign speaker ID labels
huggingface_access_token
string
To enable diarization, please enter your HuggingFace token (read). You need to accept the user agreement for the models specified in the README.
min_speakers
integer
Minimum number of speakers if diarization is activated (leave blank if unknown)
max_speakers
integer
Maximum number of speakers if diarization is activated (leave blank if unknown)
debug
boolean
False
Print out compute/inference times and memory usage information
keep_audio
boolean
False
Keep the downloaded audio file
openai_api_key
string
OpenAI API key
is_get_video_info
boolean
False
Get video info
cleanup_voice
boolean
False
Cleanup voice
deep_filter
boolean
False
Deep filter
chunk_size
integer
4
Chunk size

Output schema

The shape of the response you’ll get when you run this model with an API.

Schema
{'properties': {'detected_language': {'title': 'Detected Language',
                                      'type': 'string'},
                'score': {'title': 'Score'},
                'segments': {'title': 'Segments'},
                'srt': {'format': 'uri', 'title': 'Srt', 'type': 'string'},
                'text': {'title': 'Text'},
                'title': {'title': 'Title'},
                'video_id': {'title': 'Video Id'},
                'video_info': {'title': 'Video Info'},
                'view_count': {'title': 'View Count'}},
 'required': ['detected_language', 'srt'],
 'title': 'Output',
 'type': 'object'}