You're looking at a specific version of this model. Jump to the model overview.
turian /insanely-fast-whisper-with-video:4f41e902
Input schema
The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.
Field | Type | Default value | Description |
---|---|---|---|
audio |
string
|
Audio file. Either this or url must be provided.
|
|
url |
string
|
Video URL for yt-dlp to download the audio from. Either this or audio must be provided.
|
|
task |
string
(enum)
|
transcribe
Options: transcribe, translate |
Task to perform: transcribe or translate to another language. (default: transcribe).
|
language |
string
|
Optional. Language spoken in the audio, specify None to perform language detection.
|
|
batch_size |
integer
|
64
|
Number of parallel batches you want to compute. Reduce if you face OOMs. (default: 64).
|
timestamp |
string
(enum)
|
chunk
Options: chunk, word |
Whisper supports both chunked as well as word level timestamps. (default: chunk).
|
diarise_audio |
boolean
|
False
|
Use Pyannote.audio to diarise the audio clips. You will need to provide hf_token below too.
|
hf_token |
string
|
Provide a hf.co/settings/token for Pyannote.audio to diarise the audio clips. You need to agree to the terms in 'https://huggingface.co/pyannote/speaker-diarization-3.1' and 'https://huggingface.co/pyannote/segmentation-3.0' first.
|
Output schema
The shape of the response you’ll get when you run this model with an API.
Schema
{'title': 'Output'}