turian
/
whisply
Transcribe, translate, annotate and subtitle audio and video files with OpenAI's Whisper ... fast!
Run turian/whisply with an API
Use one of our client libraries to get started quickly. Clicking on a library will take you to the Playground tab where you can tweak different inputs, see the results, and copy the corresponding code to use in your own project.
Input schema
The fields you can use to run this model with an API. If you don't give a value for a field its default value will be used.
Field | Type | Default value | Description |
---|---|---|---|
audio_file |
string
|
Audio file to transcribe
|
|
language |
string
|
Language code (e.g., 'en', 'fr', 'de')
|
|
model |
string
(enum)
|
distil-large-v3
Options: tiny, tiny-en, base, base-en, small, small-en, distil-small-en, medium, medium-en, distil-medium-en, large, large-v1, large-v2, distil-large-v2, large-v3, distil-large-v3, large-v3-turbo |
Whisper model to use
|
subtitle |
boolean
|
False
|
Generate subtitles (.srt, .vtt)
|
sub_length |
integer
|
5
Min: 1 |
Subtitle segment length in words
|
translate |
boolean
|
False
|
Translate to English
|
annotate |
boolean
|
False
|
Enable speaker annotation (requires HF token)
|
num_speakers |
integer
|
Min: 2 |
Number of speakers to annotate (auto-detection if None)
|
hf_token |
string
|
HuggingFace Access token for speaker annotation
|
|
verbose |
boolean
|
False
|
Print text chunks during transcription
|
post_correction |
string
|
Path to YAML file for post-correction
|
{
"type": "object",
"title": "Input",
"required": [
"audio_file"
],
"properties": {
"model": {
"enum": [
"tiny",
"tiny-en",
"base",
"base-en",
"small",
"small-en",
"distil-small-en",
"medium",
"medium-en",
"distil-medium-en",
"large",
"large-v1",
"large-v2",
"distil-large-v2",
"large-v3",
"distil-large-v3",
"large-v3-turbo"
],
"type": "string",
"title": "model",
"description": "Whisper model to use",
"default": "distil-large-v3",
"x-order": 2
},
"verbose": {
"type": "boolean",
"title": "Verbose",
"default": false,
"x-order": 9,
"description": "Print text chunks during transcription"
},
"annotate": {
"type": "boolean",
"title": "Annotate",
"default": false,
"x-order": 6,
"description": "Enable speaker annotation (requires HF token)"
},
"hf_token": {
"type": "string",
"title": "Hf Token",
"x-order": 8,
"description": "HuggingFace Access token for speaker annotation"
},
"language": {
"type": "string",
"title": "Language",
"x-order": 1,
"description": "Language code (e.g., 'en', 'fr', 'de')"
},
"subtitle": {
"type": "boolean",
"title": "Subtitle",
"default": false,
"x-order": 3,
"description": "Generate subtitles (.srt, .vtt)"
},
"translate": {
"type": "boolean",
"title": "Translate",
"default": false,
"x-order": 5,
"description": "Translate to English"
},
"audio_file": {
"type": "string",
"title": "Audio File",
"format": "uri",
"x-order": 0,
"description": "Audio file to transcribe"
},
"sub_length": {
"type": "integer",
"title": "Sub Length",
"default": 5,
"minimum": 1,
"x-order": 4,
"description": "Subtitle segment length in words"
},
"num_speakers": {
"type": "integer",
"title": "Num Speakers",
"minimum": 2,
"x-order": 7,
"description": "Number of speakers to annotate (auto-detection if None)"
},
"post_correction": {
"type": "string",
"title": "Post Correction",
"format": "uri",
"x-order": 10,
"description": "Path to YAML file for post-correction"
}
}
}
Output schema
The shape of the response you’ll get when you run this model with an API.
{
"type": "string",
"title": "Output",
"format": "uri"
}