siegerts/kokoro-tts-82m

Kokoro-82M open-weight TTS. Returns audio + word-level timestamps.

Public
1.3K runs

Run time and cost

This model costs approximately $0.020 to run on Replicate, or 50 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia T4 GPU hardware. Predictions typically complete within 91 seconds. The predict time for this model varies significantly based on the inputs.

Readme

Kokoro-82M TTS with Word Timestamps

High-quality text-to-speech using Kokoro-82M (v1), ranked #1 on the TTS Arena for open-source models. Returns audio with word-level timestamps for precise text highlighting and synchronization.

Features

  • 54 voices across 9 languages
  • Word-level timestamps (English) via native synthesis engine
  • Automatic text splitting for long-form input
  • 24kHz WAV output
  • Adjustable speed (0.1x — 5.0x)

Inputs

Parameter Type Default Description
text string Text to synthesize
voice string af_heart Voice ID (see list below)
speed float 1.0 Speed multiplier (0.1 — 5.0)

Output

{
  "audio": "https://...",
  "words": [
    { "text": "Hello", "start": 0.275, "end": 0.6 },
    { "text": "world", "start": 0.65, "end": 1.1 }
  ]
}

Voices

Quality grades from VOICES.md. Grade reflects overall voice quality (A = best, F = lowest).

American English

Voice Gender Grade
af_heart F A
af_bella F A-
af_nicole F B-
af_aoede F C+
af_kore F C+
af_sarah F C+
af_alloy F C
af_nova F C
af_sky F C-
af_jessica F D
af_river F D
am_fenrir M C+
am_michael M C+
am_puck M C+
am_echo M D
am_eric M D
am_liam M D
am_onyx M D
am_adam M F+

British English

Voice Gender Grade
bf_emma F B-
bf_isabella F C
bf_alice F D
bf_lily F D
bm_fable M C
bm_george M C
bm_lewis M D+
bm_daniel M D

French

Voice Gender Grade
ff_siwis F B-

Hindi

Voice Gender Grade
hf_alpha F C
hf_beta F C
hm_omega M C
hm_psi M C

Italian

Voice Gender Grade
if_sara F C
im_nicola M C

Japanese

Voice Gender Grade
jf_alpha F C+
jf_gongitsune F C
jf_tebukuro F C
jf_nezumi F C-
jm_kumo M C-

Mandarin Chinese

Voice Gender Grade
zf_xiaobei F D
zf_xiaoni F D
zf_xiaoxiao F D
zf_xiaoyi F D
zm_yunjian M D
zm_yunxi M D
zm_yunxia M D
zm_yunyang M D

Spanish

ef_dora (F) em_alex (M) em_santa (M)

Brazilian Portuguese

pf_dora (F) pm_alex (M) pm_santa (M)

Voice naming

{language}{gender}_{name} — first letter: a=American, b=British, f=French, h=Hindi, i=Italian, j=Japanese, z=Chinese, e=Spanish, p=Portuguese. Second letter: f=female, m=male.

Notes

  • Word timestamps are available for English voices only (Kokoro v1 limitation)
  • American English pipeline is preloaded; other languages load on first use
  • Based on Kokoro-82M by hexgrad (Apache 2.0)
Model created