You're looking at a specific version of this model. Jump to the model overview.

vm6eji6m4 /whisper-chinese-pro:075c4b06

Input schema

The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.

Field Type Default value Description
audio
string
Audio file (mp3/wav/m4a/mp4). Or use file_url / file_string.
file_url
string
Audio file URL (alternative to `audio`). Public HTTPS URL.
file_string
string
Base64-encoded audio (alternative).
language
None
None
num_speakers
integer
0

Max: 10

Number of speakers (1-10). 0 = auto-detect.
prompt
string
None
use_builtin_vocab
boolean
True
None
enable_diarization
boolean
True
Run speaker diarization (requires hf_token).
gap_threshold
number
1.5

Min: 0.1

Max: 5

Merge adjacent same-speaker segments within this gap (seconds).
word_timestamps
boolean
False
Include per-word timestamps and per-word probability.
output_format
None
srt
Primary output format (JSON segments always included).
hf_token
string
None

Output schema

The shape of the response you’ll get when you run this model with an API.

Schema
{'title': 'Output', 'type': 'object'}