nyxynyx / f5-tts

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching. Voice cloning

  • Public
  • 19.4K runs
  • A100 (80GB)

Input

Video Player is loading.
Current Time 00:00:000
Duration 00:00:000
Loaded: 0%
Stream Type LIVE
Remaining Time 00:00:000
 
1x
*string
Shift + Return to add a new line

Text to generate speech from

*file

Reference audio for voice cloning

string
Shift + Return to add a new line

Reference text

boolean

Automatically remove silences?

Default: true

string
Shift + Return to add a new line

Custom split words, comma separated

Default: ""

Output

Video Player is loading.
Current Time 00:00:000
Duration 00:00:000
Loaded: 0%
Stream Type LIVE
Remaining Time 00:00:000
 
1x
Generated in

This output was created using a different version of the model, nyxynyx/f5-tts:43e8a5da.

Run time and cost

This model costs approximately $0.046 to run on Replicate, or 21 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia A100 (80GB) GPU hardware. Predictions typically complete within 33 seconds. The predict time for this model varies significantly based on the inputs.

Readme

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching

F5-TTS: Diffusion Transformer with ConvNeXt V2, faster trained and inference.