Kokoro-82M TTS with Word Timestamps

High-quality text-to-speech using Kokoro-82M (v1), ranked #1 on the TTS Arena for open-source models. Returns audio with word-level timestamps for precise text highlighting and synchronization.

Features

54 voices across 9 languages
Word-level timestamps (English) via native synthesis engine
Automatic text splitting for long-form input
24kHz WAV output
Adjustable speed (0.1x — 5.0x)

Inputs

Parameter	Type	Default	Description
`text`	string	—	Text to synthesize
`voice`	string	`af_heart`	Voice ID (see list below)
`speed`	float	`1.0`	Speed multiplier (0.1 — 5.0)

Output

{
  "audio": "https://...",
  "words": [
    { "text": "Hello", "start": 0.275, "end": 0.6 },
    { "text": "world", "start": 0.65, "end": 1.1 }
  ]
}

Voices

Quality grades from VOICES.md. Grade reflects overall voice quality (A = best, F = lowest).

American English

Voice	Gender	Grade
`af_heart`	F	A
`af_bella`	F	A-
`af_nicole`	F	B-
`af_aoede`	F	C+
`af_kore`	F	C+
`af_sarah`	F	C+
`af_alloy`	F	C
`af_nova`	F	C
`af_sky`	F	C-
`af_jessica`	F	D
`af_river`	F	D
`am_fenrir`	M	C+
`am_michael`	M	C+
`am_puck`	M	C+
`am_echo`	M	D
`am_eric`	M	D
`am_liam`	M	D
`am_onyx`	M	D
`am_adam`	M	F+

British English

Voice	Gender	Grade
`bf_emma`	F	B-
`bf_isabella`	F	C
`bf_alice`	F	D
`bf_lily`	F	D
`bm_fable`	M	C
`bm_george`	M	C
`bm_lewis`	M	D+
`bm_daniel`	M	D

French

Voice	Gender	Grade
`ff_siwis`	F	B-

Hindi

Voice	Gender	Grade
`hf_alpha`	F	C
`hf_beta`	F	C
`hm_omega`	M	C
`hm_psi`	M	C

Italian

Voice	Gender	Grade
`if_sara`	F	C
`im_nicola`	M	C

Japanese

Voice	Gender	Grade
`jf_alpha`	F	C+
`jf_gongitsune`	F	C
`jf_tebukuro`	F	C
`jf_nezumi`	F	C-
`jm_kumo`	M	C-

Mandarin Chinese

Voice	Gender	Grade
`zf_xiaobei`	F	D
`zf_xiaoni`	F	D
`zf_xiaoxiao`	F	D
`zf_xiaoyi`	F	D
`zm_yunjian`	M	D
`zm_yunxi`	M	D
`zm_yunxia`	M	D
`zm_yunyang`	M	D

Spanish

ef_dora (F) em_alex (M) em_santa (M)

Brazilian Portuguese

pf_dora (F) pm_alex (M) pm_santa (M)

Voice naming

{language}{gender}_{name} — first letter: a=American, b=British, f=French, h=Hindi, i=Italian, j=Japanese, z=Chinese, e=Spanish, p=Portuguese. Second letter: f=female, m=male.

Notes

Word timestamps are available for English voices only (Kokoro v1 limitation)
American English pipeline is preloaded; other languages load on first use
Based on Kokoro-82M by hexgrad (Apache 2.0)

Model created 1 month, 2 weeks ago