You're looking at a specific version of this model. Jump to the model overview.

zsxkib /hololive-style-bert-vits2:595ac420

Input schema

The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.

Field Type Default value Description
speaker
string (enum)
EN_MoriCalliope

Options:

EN_MoriCalliope, EN_TakanashiKiara, EN_NinomaeInanis, EN_GawrGura, EN_AmeliaWatson, EN_IRyS, EN_TsukumoSana, EN_CeresFauna, EN_OuroKronii, EN_NanashiMumei, EN_HakosBaelz, EN_ShioriNovella, EN_KosekiBijou, EN_NerissaRavencroft, EN_AiraniIofifteen, EN_KureijiOllie, EN_AnyaMelfissa, EN_VestiaZeta, JP_TokinoSora, JP_HoshimachiSuisei, JP_AZKi, JP_YozoraMel, JP_NatsuiroMatsuri, JP_AkiRosenthal, JP_AkaiHaato, JP_MinatoAqua, JP_NakiriAyame, JP_NekomataOkayu, JP_ShiranuiFlare, JP_ShiroganeNoel, JP_HoushouMarine, JP_TokoyamiTowa, JP_YukihanaLamy, JP_LaplusDarknesss, JP_TakaneLui, JP_HakuiKoyori, JP_SakamataChloe, JP_IchijouRirika

Default speaker
text_input
string
Hello there! This is test audio of a new Hololive text to speech tool running on Replicate!
Text to convert to speech (text-to-voice)
reference_audio_path
string
Path to a reference audio file (voice-to-voice)
line_split
boolean
True
Whether to split the text into lines for processing
split_interval
number
0.5
Interval between splits when line_split is True
style
string (enum)
Neutral

Options:

Neutral, Normal, Excited, Sana, Baelz1, Baelz2, BaelzShouting, Anya, IofiLoud, Iofi, ZetaSoft, Zeta, ZetaLoud, Ollie, Koyori, Chloe, Lamy, Aqua, Sora, Towa, Suisei, Ayame, Haato, Matsuri, Mel, Aki, Lui, AZKi, Flare, Ririka, Laplus, Noel, Okayu, Marine, Kronii, NerissaLaugh, Calli, Nerissa, Japanese, Happy, Reading, Fauna, Amelia, MumeiLaugh, Shiori, IRyS, Ina, Gura, Mumei, ShioriLaugh, Scared, Angry

Style of speech to use (choices may be limited based on the selected speaker)
style_weight
number
5
Weight of the style effect
use_tone
boolean
False
Whether to use tone information in the synthesis (Japanese only)
sdp_ratio
number
0.2
Ratio for speaker-dependent processing
noise_scale
number
0.6
Scale of noise to add to the synthesis
noise_scale_w
number
0.8
Scale of noise for the waveform
length_scale
number
1
Scale of the length of the synthesized speech
style_text_weight
number
0.7
Weight of the style text effect
use_style_text
boolean
False
Whether to use additional style text in the synthesis
style_text
string
Additional text to guide the style of the synthesis

Output schema

The shape of the response you’ll get when you run this model with an API.

Schema
{'format': 'uri', 'title': 'Output', 'type': 'string'}