jigsawstack/tts

Transform text into natural-sounding human-like AI voices with low latency and exceptional quality.

Public
108 runs

Run time and cost

This model runs on CPU hardware. We don't yet have enough runs of this model to provide performance information.

Readme

πŸŽ™οΈ JigsawStack Speech-to-Text (STT)

This model wraps the JigsawStack Speech-to-Text API and leverages the powerful Whisper V3 model to transcribe and optionally translate audio/video files.

It supports long files, speaker diarization, and webhook delivery for async processing β€” ideal for meetings, podcasts, interviews, or multilingual content.


🧠 What It Does

You provide a video or audio file (via URL or file_store_key), and the model returns the full transcript. It can optionally: - Auto-detect language - Translate to English or any supported language - Separate different speakers (speaker diarization)


πŸ”‘ Inputs

Name Type Required Description
url string ❌ No Public URL to the media file (audio/video)
file_store_key string ❌ No Key to a file stored in JigsawStack’s file storage
language string ❌ No Language code to force transcription language (auto-detect if omitted)
translate bool ❌ No If true, translates transcript into English (or specified language)
by_speaker bool ❌ No Enables speaker diarization to separate different speakers
webhook_url string ❌ No A webhook URL for async delivery of results
batch_size number ❌ No Controls audio chunking during processing (default: 30, max: 40)
api_key string βœ… Yes Your JigsawStack API key

πŸ”Έ You must provide either url or file_store_key. Not both.