You're looking at a specific version of this model. Jump to the model overview.
vm6eji6m4 /whisper-chinese-pro:075c4b06
Input schema
The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.
| Field | Type | Default value | Description |
|---|---|---|---|
| audio |
string
|
Audio file (mp3/wav/m4a/mp4). Or use file_url / file_string.
|
|
| file_url |
string
|
|
Audio file URL (alternative to `audio`). Public HTTPS URL.
|
| file_string |
string
|
|
Base64-encoded audio (alternative).
|
| language |
None
|
|
None
|
| num_speakers |
integer
|
0
Max: 10 |
Number of speakers (1-10). 0 = auto-detect.
|
| prompt |
string
|
|
None
|
| use_builtin_vocab |
boolean
|
True
|
None
|
| enable_diarization |
boolean
|
True
|
Run speaker diarization (requires hf_token).
|
| gap_threshold |
number
|
1.5
Min: 0.1 Max: 5 |
Merge adjacent same-speaker segments within this gap (seconds).
|
| word_timestamps |
boolean
|
False
|
Include per-word timestamps and per-word probability.
|
| output_format |
None
|
srt
|
Primary output format (JSON segments always included).
|
| hf_token |
string
|
None
|
Output schema
The shape of the response you’ll get when you run this model with an API.
Schema
{'title': 'Output', 'type': 'object'}