Inworld TTS 1.5 Max — high-quality text-to-speech API

Inworld TTS 1.5 Max is Inworld’s flagship text-to-speech model, offering the best balance of quality and speed. With <200ms median latency and support for 15 languages, it delivers the most natural, expressive speech for demanding applications.

Ranked #1 on Artificial Analysis, Inworld TTS delivers natural, expressive speech at a fraction of the cost of alternatives.

Key features

<200ms median latency: Fast enough for real-time applications
Highest quality: Best expressiveness and naturalness among Inworld models
15 languages: English, Chinese, Japanese, Korean, Russian, Italian, Spanish, Portuguese, French, German, Polish, Dutch, Hindi, Hebrew, and Arabic
Emotion control: Add emotion markups like [happy], [sad], [angry] to control delivery
Non-verbal sounds: Insert [laugh], [sigh], [cough] and other vocalizations
SSML pauses: Use <break time="1s" /> to insert natural pauses
Voice cloning: Use preset voices or bring your own cloned voice ID
Multiple formats: MP3, WAV, OGG Opus, and FLAC output

Preset voices

Voice	Description
`Ashley`	A warm, natural female voice
`Dennis`	Middle-aged man with a smooth, calm and friendly voice
`Alex`	Energetic and expressive mid-range male voice, with a mildly nasal quality
`Darlene`	Soothing, comforting Southern female voice, ideal for bedtime stories and narrations

You can also use custom cloned voice IDs from the Inworld platform. To browse all available voices, use the List Voices API or the TTS Playground.

Audio markups

The model supports rich text markups for expressive speech:

Emotions: [happy], [sad], [angry], [surprised], [fearful], [disgusted]
Delivery styles: [laughing], [whispering]
Non-verbal sounds: [breathe], [clear_throat], [cough], [laugh], [sigh], [yawn]
Pauses: <break time="1s" />, <break time="500ms" />

Choosing between Inworld TTS models

TTS 1.5 Max: Best balance of quality and speed (<200ms) — best for applications where voice quality is the top priority
TTS 1.5 Mini: Ultra-fast (~120ms), most cost-efficient — best for high-volume, latency-sensitive applications

Links

Model created 1 month, 2 weeks ago

Model updated 2 weeks ago