siegerts/kokoro-tts-82m

Kokoro-82M open-weight TTS. Returns audio + word-level timestamps.

Public
1.3K runs

Kokoro-82M TTS with Word Timestamps

High-quality text-to-speech using Kokoro-82M (v1), ranked #1 on the TTS Arena for open-source models. Returns audio with word-level timestamps for precise text highlighting and synchronization.

Features

  • 54 voices across 9 languages
  • Word-level timestamps (English) via native synthesis engine
  • Automatic text splitting for long-form input
  • 24kHz WAV output
  • Adjustable speed (0.1x — 5.0x)

Inputs

Parameter Type Default Description
text string Text to synthesize
voice string af_heart Voice ID (see list below)
speed float 1.0 Speed multiplier (0.1 — 5.0)

Output

{
  "audio": "https://...",
  "words": [
    { "text": "Hello", "start": 0.275, "end": 0.6 },
    { "text": "world", "start": 0.65, "end": 1.1 }
  ]
}

Voices

Quality grades from VOICES.md. Grade reflects overall voice quality (A = best, F = lowest).

American English

Voice Gender Grade
af_heart F A
af_bella F A-
af_nicole F B-
af_aoede F C+
af_kore F C+
af_sarah F C+
af_alloy F C
af_nova F C
af_sky F C-
af_jessica F D
af_river F D
am_fenrir M C+
am_michael M C+
am_puck M C+
am_echo M D
am_eric M D
am_liam M D
am_onyx M D
am_adam M F+

British English

Voice Gender Grade
bf_emma F B-
bf_isabella F C
bf_alice F D
bf_lily F D
bm_fable M C
bm_george M C
bm_lewis M D+
bm_daniel M D

French

Voice Gender Grade
ff_siwis F B-

Hindi

Voice Gender Grade
hf_alpha F C
hf_beta F C
hm_omega M C
hm_psi M C

Italian

Voice Gender Grade
if_sara F C
im_nicola M C

Japanese

Voice Gender Grade
jf_alpha F C+
jf_gongitsune F C
jf_tebukuro F C
jf_nezumi F C-
jm_kumo M C-

Mandarin Chinese

Voice Gender Grade
zf_xiaobei F D
zf_xiaoni F D
zf_xiaoxiao F D
zf_xiaoyi F D
zm_yunjian M D
zm_yunxi M D
zm_yunxia M D
zm_yunyang M D

Spanish

ef_dora (F) em_alex (M) em_santa (M)

Brazilian Portuguese

pf_dora (F) pm_alex (M) pm_santa (M)

Voice naming

{language}{gender}_{name} — first letter: a=American, b=British, f=French, h=Hindi, i=Italian, j=Japanese, z=Chinese, e=Spanish, p=Portuguese. Second letter: f=female, m=male.

Notes

  • Word timestamps are available for English voices only (Kokoro v1 limitation)
  • American English pipeline is preloaded; other languages load on first use
  • Based on Kokoro-82M by hexgrad (Apache 2.0)
Model created