0.5B
Text for TTS generation - REQUIRED in both modes (要转换为语音的文本 - 两种模式下都必需)
TTS mode: voice cloning requires a prompt audio file to mimic the voice; voice creation generates speech with specified gender/pitch/speed parameters. (TTS模式:声音克隆需要提供语音样本来模仿声音;声音创建使用指定的性别/音高/语速参数生成语音)
Default: "voice_creation"
[Voice Cloning] Path to the prompt audio file - REQUIRED in voice cloning mode (声音克隆模式:提示音频文件路径 - 声音克隆模式下必需)
[Voice Cloning] Transcript of prompt audio - Optional but improves quality (声音克隆模式:提示音频的文本转录 - 可选,但提供可提高质量)
Default: ""
[Voice Creation] Voice gender - REQUIRED in voice creation mode (声音创建模式:声音性别 - 声音创建模式下必需)
Default: "female"
[Voice Creation] Voice pitch level - REQUIRED in voice creation mode (声音创建模式:声音音高 - 声音创建模式下必需)
Default: "moderate"
[Voice Creation] Speaking speed - REQUIRED in voice creation mode (声音创建模式:说话速度 - 声音创建模式下必需)
Sampling temperature (0.0-1.0) - Controls randomness in generation (采样温度 - 控制生成的随机性)
Default: 0.8
Top-k sampling parameter - Limits the token selection to top k options (Top-k采样参数 - 将令牌选择限制为前k个选项)
Default: 50
Top-p sampling parameter - Nucleus sampling probability threshold (Top-p采样参数 - 核采样概率阈值)
Default: 0.95
Run this model in Node.js with one line of code:
npm install replicate
REPLICATE_API_TOKEN
export REPLICATE_API_TOKEN=<paste-your-token-here>
Find your API token in your account settings.
import Replicate from "replicate"; const replicate = new Replicate({ auth: process.env.REPLICATE_API_TOKEN, });
Run jichengdu/spark-tts using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
const output = await replicate.run( "jichengdu/spark-tts:eac056d8a49570ce3e99ed6efe3ce53527b4e3df4abc9c5471dc640dbb75006b", { input: { mode: "voice_creation", text: "白日依山尽,黄河入海流。", pitch: "high", speed: "low", top_k: 50, top_p: 0.95, gender: "female", prompt_text: "", temperature: 0.8 } } ); console.log(output);
To learn more, take a look at the guide on getting started with Node.js.
pip install replicate
import replicate
output = replicate.run( "jichengdu/spark-tts:eac056d8a49570ce3e99ed6efe3ce53527b4e3df4abc9c5471dc640dbb75006b", input={ "mode": "voice_creation", "text": "白日依山尽,黄河入海流。", "pitch": "high", "speed": "low", "top_k": 50, "top_p": 0.95, "gender": "female", "prompt_text": "", "temperature": 0.8 } ) print(output)
To learn more, take a look at the guide on getting started with Python.
curl -s -X POST \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ -H "Content-Type: application/json" \ -H "Prefer: wait" \ -d $'{ "version": "eac056d8a49570ce3e99ed6efe3ce53527b4e3df4abc9c5471dc640dbb75006b", "input": { "mode": "voice_creation", "text": "白日依山尽,黄河入海流。", "pitch": "high", "speed": "low", "top_k": 50, "top_p": 0.95, "gender": "female", "prompt_text": "", "temperature": 0.8 } }' \ https://api.replicate.com/v1/predictions
To learn more, take a look at Replicate’s HTTP API reference docs.
This is a modal window.
Beginning of dialog window. Escape will cancel and close the window.
End of dialog window.
{ "completed_at": "2025-03-25T04:37:00.603464Z", "created_at": "2025-03-25T04:35:50.288000Z", "data_removed": false, "error": null, "id": "nrb36xpsj1rme0cnseatnp4qmw", "input": { "mode": "voice_creation", "text": "白日依山尽,黄河入海流。", "pitch": "high", "speed": "low", "top_k": 50, "top_p": 0.95, "gender": "female", "prompt_text": "", "temperature": 0.8 }, "logs": "Running voice creation with text: '白日依山尽,黄河入海流。', gender: 'female', pitch: 'high', speed: 'low'\nSetting `pad_token_id` to `eos_token_id`:None for open-end generation.\nGenerated audio shape: (60800,), min: -0.36268576979637146, max: 0.43774864077568054\nSaved generated audio to generated_speech.wav", "metrics": { "predict_time": 3.566855719, "total_time": 70.315464 }, "output": "https://replicate.delivery/xezq/ZmtRBVfNDXWYPCLLVsA8XfQfS5IfSBnVhWnKwq8CfhFjdyfGF/generated_speech.wav", "started_at": "2025-03-25T04:36:57.036608Z", "status": "succeeded", "urls": { "stream": "https://stream.replicate.com/v1/files/bcwr-gnth5inxj7pshj6esk4qwd4a2pd72hl25dclsds75ey26qezvvjq", "get": "https://api.replicate.com/v1/predictions/nrb36xpsj1rme0cnseatnp4qmw", "cancel": "https://api.replicate.com/v1/predictions/nrb36xpsj1rme0cnseatnp4qmw/cancel" }, "version": "eac056d8a49570ce3e99ed6efe3ce53527b4e3df4abc9c5471dc640dbb75006b" }
Running voice creation with text: '白日依山尽,黄河入海流。', gender: 'female', pitch: 'high', speed: 'low' Setting `pad_token_id` to `eos_token_id`:None for open-end generation. Generated audio shape: (60800,), min: -0.36268576979637146, max: 0.43774864077568054 Saved generated audio to generated_speech.wav
This model runs on Nvidia L40S GPU hardware. We don't yet have enough runs of this model to provide performance information.
This model doesn't have a readme.
This model is cold. You'll get a fast response if the model is warm and already running, and a slower response if the model is cold and starting up.
Choose a file from your machine
Hint: you can also drag files onto the input