Kokoro-82M TTS with Word Timestamps
High-quality text-to-speech using Kokoro-82M (v1), ranked #1 on the TTS Arena for open-source models. Returns audio with word-level timestamps for precise text highlighting and synchronization.
Features
54 voices across 9 languages
Word-level timestamps (English) via native synthesis engine
Automatic text splitting for long-form input
24kHz WAV output
Adjustable speed (0.1x — 5.0x)
Parameter
Type
Default
Description
text
string
—
Text to synthesize
voice
string
af_heart
Voice ID (see list below)
speed
float
1.0
Speed multiplier (0.1 — 5.0)
Output
{
"audio": "https://...",
"words": [
{ "text": "Hello", "start": 0.275, "end": 0.6 },
{ "text": "world", "start": 0.65, "end": 1.1 }
]
}
Voices
Quality grades from VOICES.md . Grade reflects overall voice quality (A = best, F = lowest).
American English
Voice
Gender
Grade
af_heart
F
A
af_bella
F
A-
af_nicole
F
B-
af_aoede
F
C+
af_kore
F
C+
af_sarah
F
C+
af_alloy
F
C
af_nova
F
C
af_sky
F
C-
af_jessica
F
D
af_river
F
D
am_fenrir
M
C+
am_michael
M
C+
am_puck
M
C+
am_echo
M
D
am_eric
M
D
am_liam
M
D
am_onyx
M
D
am_adam
M
F+
British English
Voice
Gender
Grade
bf_emma
F
B-
bf_isabella
F
C
bf_alice
F
D
bf_lily
F
D
bm_fable
M
C
bm_george
M
C
bm_lewis
M
D+
bm_daniel
M
D
French
Voice
Gender
Grade
ff_siwis
F
B-
Hindi
Voice
Gender
Grade
hf_alpha
F
C
hf_beta
F
C
hm_omega
M
C
hm_psi
M
C
Italian
Voice
Gender
Grade
if_sara
F
C
im_nicola
M
C
Japanese
Voice
Gender
Grade
jf_alpha
F
C+
jf_gongitsune
F
C
jf_tebukuro
F
C
jf_nezumi
F
C-
jm_kumo
M
C-
Mandarin Chinese
Voice
Gender
Grade
zf_xiaobei
F
D
zf_xiaoni
F
D
zf_xiaoxiao
F
D
zf_xiaoyi
F
D
zm_yunjian
M
D
zm_yunxi
M
D
zm_yunxia
M
D
zm_yunyang
M
D
Spanish
ef_dora (F) em_alex (M) em_santa (M)
Brazilian Portuguese
pf_dora (F) pm_alex (M) pm_santa (M)
Voice naming
{language}{gender}_{name} — first letter: a=American, b=British, f=French, h=Hindi, i=Italian, j=Japanese, z=Chinese, e=Spanish, p=Portuguese. Second letter: f=female, m=male.
Notes
Word timestamps are available for English voices only (Kokoro v1 limitation)
American English pipeline is preloaded; other languages load on first use
Based on Kokoro-82M by hexgrad (Apache 2.0)
Model created
1 month, 2 weeks ago