You're looking at a specific version of this model. Jump to the model overview.

adirik /styletts2:53fd5081

Input

*string
Shift + Return to add a new line

Text to convert to speech

file

Reference speech to copy style from

number
(minimum: 0, maximum: 1)

Only used for long text inputs or in case of reference speaker, determines the timbre of the speaker. Use lower values to sample style based on previous or reference speech instead of text.

Default: 0.3

number
(minimum: 0, maximum: 1)

Only used for long text inputs or in case of reference speaker, determines the prosody of the speaker. Use lower values to sample style based on previous or reference speech instead of text.

Default: 0.7

integer
(minimum: 0, maximum: 50)

Number of diffusion steps

Default: 10

number
(minimum: 0, maximum: 5)

Embedding scale, use higher values for pronounced emotion

Default: 1

integer

Seed for reproducibility

Default: 0

Output

No output yet! Press "Submit" to start a prediction.