inworld/tts-1.5-max

Highest-quality text-to-speech with <200ms latency, emotion control, and 15-language support

539 runs

Inworld TTS 1.5 Max is Inworld’s flagship text-to-speech model, offering the best balance of quality and speed. With <200ms median latency and support for 15 languages, it delivers the most natural, expressive speech for demanding applications.

Ranked #1 on Artificial Analysis, Inworld TTS delivers natural, expressive speech at a fraction of the cost of alternatives.

Key features

  • <200ms median latency: Fast enough for real-time applications
  • Highest quality: Best expressiveness and naturalness among Inworld models
  • 15 languages: English, Chinese, Japanese, Korean, Russian, Italian, Spanish, Portuguese, French, German, Polish, Dutch, Hindi, Hebrew, and Arabic
  • Emotion control: Add emotion markups like [happy], [sad], [angry] to control delivery
  • Non-verbal sounds: Insert [laugh], [sigh], [cough] and other vocalizations
  • SSML pauses: Use <break time="1s" /> to insert natural pauses
  • Voice cloning: Use preset voices or bring your own cloned voice ID
  • Multiple formats: MP3, WAV, OGG Opus, and FLAC output

Preset voices

Voice Description
Ashley A warm, natural female voice
Dennis Middle-aged man with a smooth, calm and friendly voice
Alex Energetic and expressive mid-range male voice, with a mildly nasal quality
Darlene Soothing, comforting Southern female voice, ideal for bedtime stories and narrations

You can also use custom cloned voice IDs from the Inworld platform. To browse all available voices, use the List Voices API or the TTS Playground.

Audio markups

The model supports rich text markups for expressive speech:

  • Emotions: [happy], [sad], [angry], [surprised], [fearful], [disgusted]
  • Delivery styles: [laughing], [whispering]
  • Non-verbal sounds: [breathe], [clear_throat], [cough], [laugh], [sigh], [yawn]
  • Pauses: <break time="1s" />, <break time="500ms" />

Choosing between Inworld TTS models

  • TTS 1.5 Max: Best balance of quality and speed (<200ms) — best for applications where voice quality is the top priority
  • TTS 1.5 Mini: Ultra-fast (~120ms), most cost-efficient — best for high-volume, latency-sensitive applications
Model created
Model updated