You're looking at a specific version of this model. Jump to the model overview.

zsxkib /tortoise-then-rvc:d1578fbc

Input schema

The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.

Field Type Default value Description
text
string
The expressiveness of autoregressive transformers is literally nuts! I absolutely adore them.
TorToiSe: Text to speak.
voice_a
string (enum)
random

Options:

angie, cond_latent_example, deniro, freeman, halle, lj, myself, pat2, snakes, tom, train_daws, train_dreams, train_grace, train_lescault, weaver, applejack, daniel, emma, geralt, jlaw, mol, pat, rainbow, tim_reynolds, train_atkins, train_dotrice, train_empire, train_kennard, train_mouse, william, random, custom_voice, disabled

TorToiSe: Selects the voice to use for generation. Use `random` to select a random voice. Use `custom_voice` to use a custom voice.
custom_voice
string
TorToiSe: (Optional) Create a custom voice based on an mp3 file of a speaker. Audio should be at least 15 seconds, only contain one speaker, and be in mp3 format. Overrides the `voice_a` input.
voice_b
string (enum)
disabled

Options:

angie, cond_latent_example, deniro, freeman, halle, lj, myself, pat2, snakes, tom, train_daws, train_dreams, train_grace, train_lescault, weaver, applejack, daniel, emma, geralt, jlaw, mol, pat, rainbow, tim_reynolds, train_atkins, train_dotrice, train_empire, train_kennard, train_mouse, william, random, custom_voice, disabled

TorToiSe: (Optional) Create new voice from averaging the latents for `voice_a`, `voice_b` and `voice_c`. Use `disabled` to disable voice mixing.
voice_c
string (enum)
disabled

Options:

angie, cond_latent_example, deniro, freeman, halle, lj, myself, pat2, snakes, tom, train_daws, train_dreams, train_grace, train_lescault, weaver, applejack, daniel, emma, geralt, jlaw, mol, pat, rainbow, tim_reynolds, train_atkins, train_dotrice, train_empire, train_kennard, train_mouse, william, random, custom_voice, disabled

TorToiSe: (Optional) Create new voice from averaging the latents for `voice_a`, `voice_b` and `voice_c`. Use `disabled` to disable voice mixing.
preset
string (enum)
fast

Options:

ultra_fast, fast, standard, high_quality

Which voice preset to use. See the documentation for more information.
seed
integer
0
TorToiSe: Random seed which can be used to reproduce results.
cvvp_amount
number
0

Max: 1

TorToiSe: How much the CVVP model should influence the output. Increasing this can in some cases reduce the likelyhood of multiple speakers. Defaults to 0 (disabled)
pre_process_with_rvc
boolean
True
Use Realistic Voice Cloning v2 (RVCv2) to further enhance the voice created by TorToiSe Text-To-Speech
rvc_model
string (enum)
CUSTOM

Options:

CUSTOM, Squidward, MrKrabs, Plankton, Drake, Vader, Trump, Biden, Obama, Guitar, Voilin

RVC model for a specific voice. If using a custom model, this should match the name of the downloaded model. If a 'custom_rvc_model_download_url' is provided, this will be automatically set to the name of the downloaded model.
custom_rvc_model_download_url
string
RVC: (When `pre_process_with_rvc=True`) URL to download a custom RVC model. If provided, the model will be downloaded (if it doesn't already exist) and used for prediction, regardless of the 'rvc_model' value.
pitch_change
string (enum)
no-change

Options:

no-change, male-to-female, female-to-male

RVC: (When `pre_process_with_rvc=True`) Adjust pitch of AI vocals. Options: `no-change`, `male-to-female`, `female-to-male`.
index_rate
number
0.5

Max: 1

RVC: (When `pre_process_with_rvc=True`) Control how much of the AI's accent to leave in the vocals.
filter_radius
integer
3

Max: 7

RVC: (When `pre_process_with_rvc=True`) If >=3: apply median filtering median filtering to the harvested pitch results.
rms_mix_rate
number
0.25

Max: 1

RVC: (When `pre_process_with_rvc=True`) Control how much to use the original vocal's loudness (0) or a fixed loudness (1).
pitch_detection_algorithm
string (enum)
rmvpe

Options:

rmvpe, mangio-crepe

RVC: (When `pre_process_with_rvc=True`) Best option is rmvpe (clarity in vocals), then mangio-crepe (smoother vocals).
crepe_hop_length
integer
128
RVC: (When `pre_process_with_rvc=True`) When `pitch_detection_algo` is set to `mangio-crepe`, this controls how often it checks for pitch changes in milliseconds. Lower values lead to longer conversions and higher risk of voice cracks, but better pitch accuracy.
protect
number
0.33

Max: 0.5

RVC: (When `pre_process_with_rvc=True`) Control how much of the original vocals' breath and voiceless consonants to leave in the AI vocals. Set 0.5 to disable.
main_vocals_volume_change
number
0
RVC: (When `pre_process_with_rvc=True`) Control volume of main AI vocals. Use -3 to decrease the volume by 3 decibels, or 3 to increase the volume by 3 decibels.
backup_vocals_volume_change
number
0
RVC: (When `pre_process_with_rvc=True`) Control volume of backup AI vocals.
instrumental_volume_change
number
0
RVC: (When `pre_process_with_rvc=True`) Control volume of the background music/instrumentals.
pitch_change_all
number
0
RVC: (When `pre_process_with_rvc=True`) Change pitch/key of background music, backup vocals and AI vocals in semitones. Reduces sound quality slightly.
reverb_size
number
0.15

Max: 1

RVC: (When `pre_process_with_rvc=True`) The larger the room, the longer the reverb time.
reverb_wetness
number
0.2

Max: 1

RVC: (When `pre_process_with_rvc=True`) Level of AI vocals with reverb.
reverb_dryness
number
0.8

Max: 1

RVC: (When `pre_process_with_rvc=True`) Level of AI vocals without reverb.
reverb_damping
number
0.7

Max: 1

RVC: (When `pre_process_with_rvc=True`) Absorption of high frequencies in the reverb.
output_format
string (enum)
mp3

Options:

mp3, wav

RVC: (When `pre_process_with_rvc=True`) wav for best quality and large file size, mp3 for decent quality and small file size.

Output schema

The shape of the response you’ll get when you run this model with an API.

Schema
{'items': {'format': 'uri', 'type': 'string'},
 'title': 'Output',
 'type': 'array',
 'x-cog-array-type': 'iterator'}