This model costs approximately $0.79 to run on Replicate, or 1 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.
This model runs on Nvidia T4 GPU hardware.
Predictions typically complete within 59 minutes.
The predict time for this model varies significantly based on the inputs.
Readme
license: apache-2.0
language:
- en
base_model:
- yl4579/StyleTTS2-LJSpeech
pipeline_tag: text-to-speech
Disclaimer
This is a fork of the original Kokoro repo, in order to provide easy inference on Replicate. I am not affiliated with the original Kokoro authors, and this is not an official release of the Kokoro model. Similar to the Huggingface Space, this implementation provides automatic text splitting to support long form text inputs. See the original README below for more details.
Kokoro is an open-weight TTS model with 82 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, Kokoro can be deployed anywhere from production environments to personal projects.
Creative Commons Attribution
The following CC BY audio was part of the dataset used to train Kokoro v1.0.