Readme
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
F5-TTS: Diffusion Transformer with ConvNeXt V2, faster trained and inference.
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching. Voice cloning
Run this model in Node.js with one line of code:
npm install replicate
REPLICATE_API_TOKEN
environment variable:export REPLICATE_API_TOKEN=<paste-your-token-here>
Find your API token in your account settings.
import Replicate from "replicate";
const replicate = new Replicate({
auth: process.env.REPLICATE_API_TOKEN,
});
Run nyxynyx/f5-tts using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
const output = await replicate.run(
"nyxynyx/f5-tts:e0e48acce40cb39931ed5f1b04e21492bdcf2eb0a0f96842a5e537531e86389b",
{
input: {
gen_text: "When something is important enough, you do it even if the odds are not in your favor.",
ref_audio: "https://replicate.delivery/pbxt/Lo5PhtzOHIpE658sLaFoyibIHDYcJIngl5NaJ74dDkMYPwms/elon_musk_with_tucker_carlson_extract_02.mp3",
remove_silence: true,
custom_split_words: ""
}
}
);
// To access the file URL:
console.log(output.url()); //=> "http://example.com"
// To write the file to disk:
fs.writeFile("my-image.png", output);
To learn more, take a look at the guide on getting started with Node.js.
pip install replicate
REPLICATE_API_TOKEN
environment variable:export REPLICATE_API_TOKEN=<paste-your-token-here>
Find your API token in your account settings.
import replicate
Run nyxynyx/f5-tts using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
output = replicate.run(
"nyxynyx/f5-tts:e0e48acce40cb39931ed5f1b04e21492bdcf2eb0a0f96842a5e537531e86389b",
input={
"gen_text": "When something is important enough, you do it even if the odds are not in your favor.",
"ref_audio": "https://replicate.delivery/pbxt/Lo5PhtzOHIpE658sLaFoyibIHDYcJIngl5NaJ74dDkMYPwms/elon_musk_with_tucker_carlson_extract_02.mp3",
"remove_silence": True,
"custom_split_words": ""
}
)
print(output)
To learn more, take a look at the guide on getting started with Python.
REPLICATE_API_TOKEN
environment variable:export REPLICATE_API_TOKEN=<paste-your-token-here>
Find your API token in your account settings.
Run nyxynyx/f5-tts using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
curl -s -X POST \
-H "Authorization: Bearer $REPLICATE_API_TOKEN" \
-H "Content-Type: application/json" \
-H "Prefer: wait" \
-d $'{
"version": "e0e48acce40cb39931ed5f1b04e21492bdcf2eb0a0f96842a5e537531e86389b",
"input": {
"gen_text": "When something is important enough, you do it even if the odds are not in your favor.",
"ref_audio": "https://replicate.delivery/pbxt/Lo5PhtzOHIpE658sLaFoyibIHDYcJIngl5NaJ74dDkMYPwms/elon_musk_with_tucker_carlson_extract_02.mp3",
"remove_silence": true,
"custom_split_words": ""
}
}' \
https://api.replicate.com/v1/predictions
To learn more, take a look at Replicate’s HTTP API reference docs.
Add a payment method to run this model.
By signing in, you agree to our
terms of service and privacy policy
This is a modal window.
Beginning of dialog window. Escape will cancel and close the window.
End of dialog window.
{
"completed_at": "2024-10-16T20:52:48.345532Z",
"created_at": "2024-10-16T20:51:29.313000Z",
"data_removed": false,
"error": null,
"id": "zh3nnncnm5rj60cjjwctdx4fjg",
"input": {
"gen_text": "When something is important enough, you do it even if the odds are not in your favor.",
"ref_audio": "https://replicate.delivery/pbxt/Lo5PhtzOHIpE658sLaFoyibIHDYcJIngl5NaJ74dDkMYPwms/elon_musk_with_tucker_carlson_extract_02.mp3",
"remove_silence": true,
"custom_split_words": ""
},
"logs": "Generating: When something is important enough, you do it even if the odds are not in your favor.\n[*] Converting reference audio...\n[+] Converted reference audio.\nNo reference text provided, transcribing reference audio...\n/root/.pyenv/versions/3.11.10/lib/python3.11/site-packages/transformers/models/whisper/generation_whisper.py:496: FutureWarning: The input name `inputs` is deprecated. Please make sure to use `input_features` instead.\nwarnings.warn(\nYou have passed task=transcribe, but also have set `forced_decoder_ids` to [[1, None], [2, 50360]] which creates a conflict. `forced_decoder_ids` will be ignored in favor of task=transcribe.\nPassing a tuple of `past_key_values` is deprecated and will be removed in Transformers v4.43.0. You should pass an instance of `EncoderDecoderCache` instead, e.g. `past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values)`.\nThe attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\n[+] Finished transcription\n[+] Reference text: And he's writing a book now, which hopefully he'll publish soon, which is about suicidal empathy. Where you have so much empathy, you're actually suiciding society.\n[*] Forming batches...\n[+] Number of batches: 1\n------ Batch 1 -------------------\nWhen something is important enough, you do it even if the odds are not in your favor.\n--------------------------------------\n0%| | 0/1 [00:00<?, ?it/s]Building prefix dict from the default dictionary ...\nDumping model to file cache /tmp/jieba.cache\nLoading model cost 0.479 seconds.\nPrefix dict has been built successfully.\n/root/.pyenv/versions/3.11.10/lib/python3.11/site-packages/vocos/pretrained.py:70: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.\nstate_dict = torch.load(model_path, map_location=\"cpu\")\n100%|██████████| 1/1 [00:04<00:00, 4.34s/it]\n100%|██████████| 1/1 [00:04<00:00, 4.34s/it]\n[*] Removing silence...\n[*] Removed silence\n[*] Saving output.wav\n[*] Saved output.wav",
"metrics": {
"predict_time": 8.395652482,
"total_time": 79.032532
},
"output": "https://replicate.delivery/yhqm/AGocwpDHbnLWPtVvYrLzVqkdEY5G8yBr8uV6UXLt3JHoZ35E/output.wav",
"started_at": "2024-10-16T20:52:39.949880Z",
"status": "succeeded",
"urls": {
"get": "https://api.replicate.com/v1/predictions/zh3nnncnm5rj60cjjwctdx4fjg",
"cancel": "https://api.replicate.com/v1/predictions/zh3nnncnm5rj60cjjwctdx4fjg/cancel"
},
"version": "43e8a5da484777eeeb2b780cb4daaf01d1cf6b82b4f644aaddce4e7f70c68811"
}
Generating: When something is important enough, you do it even if the odds are not in your favor.
[*] Converting reference audio...
[+] Converted reference audio.
No reference text provided, transcribing reference audio...
/root/.pyenv/versions/3.11.10/lib/python3.11/site-packages/transformers/models/whisper/generation_whisper.py:496: FutureWarning: The input name `inputs` is deprecated. Please make sure to use `input_features` instead.
warnings.warn(
You have passed task=transcribe, but also have set `forced_decoder_ids` to [[1, None], [2, 50360]] which creates a conflict. `forced_decoder_ids` will be ignored in favor of task=transcribe.
Passing a tuple of `past_key_values` is deprecated and will be removed in Transformers v4.43.0. You should pass an instance of `EncoderDecoderCache` instead, e.g. `past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values)`.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
[+] Finished transcription
[+] Reference text: And he's writing a book now, which hopefully he'll publish soon, which is about suicidal empathy. Where you have so much empathy, you're actually suiciding society.
[*] Forming batches...
[+] Number of batches: 1
------ Batch 1 -------------------
When something is important enough, you do it even if the odds are not in your favor.
--------------------------------------
0%| | 0/1 [00:00<?, ?it/s]Building prefix dict from the default dictionary ...
Dumping model to file cache /tmp/jieba.cache
Loading model cost 0.479 seconds.
Prefix dict has been built successfully.
/root/.pyenv/versions/3.11.10/lib/python3.11/site-packages/vocos/pretrained.py:70: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
state_dict = torch.load(model_path, map_location="cpu")
100%|██████████| 1/1 [00:04<00:00, 4.34s/it]
100%|██████████| 1/1 [00:04<00:00, 4.34s/it]
[*] Removing silence...
[*] Removed silence
[*] Saving output.wav
[*] Saved output.wav
This output was created using a different version of the model, nyxynyx/f5-tts:43e8a5da.
This model costs approximately $0.046 to run on Replicate, or 21 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.
This model runs on Nvidia A100 (80GB) GPU hardware. Predictions typically complete within 33 seconds. The predict time for this model varies significantly based on the inputs.
F5-TTS: Diffusion Transformer with ConvNeXt V2, faster trained and inference.
This model is cold. You'll get a fast response if the model is warm and already running, and a slower response if the model is cold and starting up.
Choose a file from your machine
Hint: you can also drag files onto the input
Generating: When something is important enough, you do it even if the odds are not in your favor.
[*] Converting reference audio...
[+] Converted reference audio.
No reference text provided, transcribing reference audio...
/root/.pyenv/versions/3.11.10/lib/python3.11/site-packages/transformers/models/whisper/generation_whisper.py:496: FutureWarning: The input name `inputs` is deprecated. Please make sure to use `input_features` instead.
warnings.warn(
You have passed task=transcribe, but also have set `forced_decoder_ids` to [[1, None], [2, 50360]] which creates a conflict. `forced_decoder_ids` will be ignored in favor of task=transcribe.
Passing a tuple of `past_key_values` is deprecated and will be removed in Transformers v4.43.0. You should pass an instance of `EncoderDecoderCache` instead, e.g. `past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values)`.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
[+] Finished transcription
[+] Reference text: And he's writing a book now, which hopefully he'll publish soon, which is about suicidal empathy. Where you have so much empathy, you're actually suiciding society.
[*] Forming batches...
[+] Number of batches: 1
------ Batch 1 -------------------
When something is important enough, you do it even if the odds are not in your favor.
--------------------------------------
0%| | 0/1 [00:00<?, ?it/s]Building prefix dict from the default dictionary ...
Dumping model to file cache /tmp/jieba.cache
Loading model cost 0.479 seconds.
Prefix dict has been built successfully.
/root/.pyenv/versions/3.11.10/lib/python3.11/site-packages/vocos/pretrained.py:70: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
state_dict = torch.load(model_path, map_location="cpu")
100%|██████████| 1/1 [00:04<00:00, 4.34s/it]
100%|██████████| 1/1 [00:04<00:00, 4.34s/it]
[*] Removing silence...
[*] Removed silence
[*] Saving output.wav
[*] Saved output.wav