You're looking at a specific version of this model. Jump to the model overview.

vm6eji6m4 /whisper-chinese-pro:d488c22c

Input schema

The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.

Field Type Default value Description
audio
string
Audio file (mp3/wav/m4a/mp4). Or use file_url / file_string instead.
file_url
string
Audio file URL (alternative to `audio`). Public HTTP/HTTPS URL.
file_string
string
Base64-encoded audio (alternative to `audio` / `file_url`).
language
None
None
num_speakers
integer
0

Max: 10

Number of speakers (1-10). Leave 0 for auto-detect.
prompt
string
None
enable_diarization
boolean
True
Run speaker diarization (requires hf_token). Set false for ~30% speedup.
gap_threshold
number
1.5

Min: 0.1

Max: 5

Merge adjacent same-speaker segments within this gap (seconds).
word_timestamps
boolean
False
Include per-word timestamps and per-word probability in output.
output_format
None
srt
Primary output format (JSON segments always included).
hf_token
string
None

Output schema

The shape of the response you’ll get when you run this model with an API.

Schema
{'title': 'Output', 'type': 'object'}