zsxkib / hololive-style-bert-vits2

🎙️Hololive text-to-speech and voice-to-voice (Japanese🇯🇵 + English🇬🇧)

  • Public
  • 854 runs
  • L40S
  • GitHub
  • License

Input

string

Default speaker

Default: "EN_MoriCalliope"

string
Shift + Return to add a new line

Text to convert to speech (text-to-voice)

Default: "Hello there! This is test audio of a new Hololive text to speech tool running on Replicate!"

file

Path to a reference audio file (voice-to-voice)

boolean

Whether to split the text into lines for processing

Default: true

number

Interval between splits when line_split is True

Default: 0.5

string

Style of speech to use (choices may be limited based on the selected speaker)

Default: "Neutral"

number

Weight of the style effect

Default: 5

boolean

Whether to use tone information in the synthesis (Japanese only)

Default: false

number

Ratio for speaker-dependent processing

Default: 0.2

number

Scale of noise to add to the synthesis

Default: 0.6

number

Scale of noise for the waveform

Default: 0.8

number

Scale of the length of the synthesized speech

Default: 1

number

Weight of the style text effect

Default: 0.7

boolean

Whether to use additional style text in the synthesis

Default: false

string
Shift + Return to add a new line

Additional text to guide the style of the synthesis

Default: ""

Output

Video Player is loading.
Current Time 00:00:000
Duration 00:00:000
Loaded: 0%
Stream Type LIVE
Remaining Time 00:00:000
 
1x
Generated in

Run time and cost

This model costs approximately $0.0024 to run on Replicate, or 416 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia L40S GPU hardware. Predictions typically complete within 3 seconds.

Readme

🎤 Hololive-Style-Bert-VITS2

Follow me on X @zsakib_ for more AI projects and updates!

🌟 Unleash the Power of Virtual Voices

Hololive-Style-Bert-VITS2 is an advanced AI model that generates high-quality voice outputs in the style of your favorite Hololive Virtual YouTubers (VTubers). With this model, you can create engaging and realistic voice content that captures the unique charm and personality of Hololive characters.

🎭 Bring Your Imagination to Life

  • Voice Style Customization: Tailor the generated voice to your preferences by adjusting tone, emotion, and style using the intuitive sliders and settings in the model’s web interface.
  • Multilingual Support: Generate voices in English, Japanese, and Chinese, making it perfect for a wide range of applications and audiences.
  • Seamless Integration: Easily integrate the model into your projects using the provided API endpoints, allowing you to generate voice outputs programmatically.

🚀 Powered by Cutting-Edge Technology

Hololive-Style-Bert-VITS2 combines state-of-the-art deep learning techniques to deliver exceptional results:

  • BERT: A transformer-based model that excels in understanding and generating text, capturing the nuances and style of Hololive VTubers.
  • VITS2: An advanced text-to-speech model that produces natural-sounding speech with enhanced variability and expressiveness.

🎨 Endless Creative Possibilities

Whether you’re creating content for videos, live streaming, or other multimedia applications, Hololive-Style-Bert-VITS2 opens up a world of possibilities. Customize voice styles and emotions to suit your creative vision and engage your audience like never before.

🙏 Acknowledgments

This model is based on the incredible work by the following individuals:

A special thanks to litagin02 for their efforts in making Style-Bert-VITS2 accessible to Japanese users and providing detailed documentation and tutorials.

🛠️ Explore the Model on Replicate

This model was built using the power of Replicate, a platform that makes it easy to create and share machine learning models. With the intuitive web interface, you can quickly generate high-quality voice outputs by adjusting various input parameters and settings.

Experience the magic of Hololive-Style-Bert-VITS2 and let your virtual voice creations come to life! 🎉✨

Note: Most of the models are a work in progress. They may not sound fully correct. Do no evil.