ποΈ JigsawStack Speech-to-Text (STT)
This model wraps the JigsawStack Speech-to-Text API and leverages the powerful Whisper V3 model to transcribe and optionally translate audio/video files.
It supports long files, speaker diarization, and webhook delivery for async processing β ideal for meetings, podcasts, interviews, or multilingual content.
π§ What It Does
You provide a video or audio file (via URL or file_store_key), and the model returns the full transcript. It can optionally:
- Auto-detect language
- Translate to English or any supported language
- Separate different speakers (speaker diarization)
π Inputs
| Name | Type | Required | Description |
|---|---|---|---|
url |
string | β No | Public URL to the media file (audio/video) |
file_store_key |
string | β No | Key to a file stored in JigsawStackβs file storage |
language |
string | β No | Language code to force transcription language (auto-detect if omitted) |
translate |
bool | β No | If true, translates transcript into English (or specified language) |
by_speaker |
bool | β No | Enables speaker diarization to separate different speakers |
webhook_url |
string | β No | A webhook URL for async delivery of results |
batch_size |
number | β No | Controls audio chunking during processing (default: 30, max: 40) |
api_key |
string | β Yes | Your JigsawStack API key |
πΈ You must provide either
urlorfile_store_key. Not both.