You're looking at a specific version of this model. Jump to the model overview.
suminhthanh /whisperx-custom:74441eec
Input schema
The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.
Field | Type | Default value | Description |
---|---|---|---|
audio_file |
string
|
Audio file
|
|
url |
string
|
URL of the audio file (if audio_file is not provided)
|
|
language |
string
|
vi
|
ISO code of the language spoken in the audio, specify None to perform language detection
|
language_detection_min_prob |
number
|
0
|
If language is not specified, then the language will be detected recursively on different parts of the file until it reaches the given probability
|
language_detection_max_tries |
integer
|
5
|
If language is not specified, then the language will be detected following the logic of language_detection_min_prob parameter, but will stop after the given max retries. If max retries is reached, the most probable language is kept.
|
initial_prompt |
string
|
Optional text to provide as a prompt for the first window
|
|
batch_size |
integer
|
64
|
Parallelization of input audio transcription
|
temperature |
number
|
0
|
Temperature to use for sampling
|
vad_onset |
number
|
0.5
|
VAD onset
|
vad_offset |
number
|
0.363
|
VAD offset
|
align_output |
boolean
|
False
|
Aligns whisper output to get accurate word-level timestamps
|
diarization |
boolean
|
False
|
Assign speaker ID labels
|
huggingface_access_token |
string
|
To enable diarization, please enter your HuggingFace token (read). You need to accept the user agreement for the models specified in the README.
|
|
min_speakers |
integer
|
Minimum number of speakers if diarization is activated (leave blank if unknown)
|
|
max_speakers |
integer
|
Maximum number of speakers if diarization is activated (leave blank if unknown)
|
|
debug |
boolean
|
False
|
Print out compute/inference times and memory usage information
|
keep_audio |
boolean
|
False
|
Keep the downloaded audio file
|
openai_api_key |
string
|
OpenAI API key
|
|
is_get_video_info |
boolean
|
False
|
Get video info
|
cleanup_voice |
boolean
|
False
|
Cleanup voice
|
deep_filter |
boolean
|
False
|
Deep filter
|
chunk_size |
integer
|
4
|
Chunk size
|
Output schema
The shape of the response you’ll get when you run this model with an API.
Schema
{'properties': {'detected_language': {'title': 'Detected Language',
'type': 'string'},
'score': {'title': 'Score'},
'segments': {'title': 'Segments'},
'srt': {'format': 'uri', 'title': 'Srt', 'type': 'string'},
'text': {'title': 'Text'},
'title': {'title': 'Title'},
'video_id': {'title': 'Video Id'},
'video_info': {'title': 'Video Info'},
'view_count': {'title': 'View Count'}},
'required': ['detected_language', 'srt'],
'title': 'Output',
'type': 'object'}