You're looking at a specific version of this model. Jump to the model overview.

bzikst /higgs-audio-v3-tts-4b:f968dee2

Input schema

The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.

Field Type Default value Description
text
string
Hello, this is Higgs Audio v3 TTS.
Text to synthesize. Supports inline control tokens, e.g. " "<|prosody:pause|>, <|prosody:long_pause|>, <|emotion:...|>.
reference_audio
string
Optional reference audio for zero-shot voice cloning " "(WAV/MP3).
reference_text
string
Transcript of the reference audio (materially improves " "cloning quality).
voice
string
default
Preset voice name (ignored when reference_audio is set).
response_format
None
wav
Output audio format.
temperature
number
1

Max: 2

Sampling temperature.
top_p
number

Max: 1

Top-p (nucleus) sampling. Unset = server default.
top_k
integer
Top-k sampling. Unset = server default.
max_new_tokens
integer
2048

Min: 1

Max: 8192

Maximum number of generated multi-codebook steps.
seed
integer
Random seed for reproducibility. Unset = random.

Output schema

The shape of the response you’ll get when you run this model with an API.

Schema
{'format': 'uri', 'title': 'Output', 'type': 'string'}