ttsds
/
amphion_maskgct
The MaskGCT model by Amphion.
Prediction
ttsds/amphion_maskgct:619089fed0a4ac2651ca4a30734f4e1c3d7f67e21a282d56e6f2bfba187e1a28IDhp5d3hh0nnrmc0cmjkt9kev744StatusSucceededSourceWebHardwareL40STotal durationCreatedInput
- text
- With tenure, Suzie'd have all the more leisure for yachting, but her publications are no good.
- language
- en
- speaker_reference
- Video Player is loading.Current Time 00:00:000/Duration 00:00:000Loaded: 0%00:00:000Stream Type LIVERemaining Time -00:00:0001x
- Chapters
- descriptions off, selected
- captions settings, opens captions settings dialog
- captions off, selected
This is a modal window.
Beginning of dialog window. Escape will cancel and close the window.
End of dialog window.
{ "text": "With tenure, Suzie'd have all the more leisure for yachting, but her publications are no good.", "language": "en", "speaker_reference": "https://replicate.delivery/pbxt/MNDu8UJR7zB1dZHG3UOPCD5B4crZunv2j32UsTd3Qd5PdG1R/example.wav" }
Install Replicate’s Node.js client library:npm install replicate
Import and set up the client:import Replicate from "replicate"; const replicate = new Replicate({ auth: process.env.REPLICATE_API_TOKEN, });
Run ttsds/amphion_maskgct using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
const output = await replicate.run( "ttsds/amphion_maskgct:619089fed0a4ac2651ca4a30734f4e1c3d7f67e21a282d56e6f2bfba187e1a28", { input: { text: "With tenure, Suzie'd have all the more leisure for yachting, but her publications are no good.", language: "en", speaker_reference: "https://replicate.delivery/pbxt/MNDu8UJR7zB1dZHG3UOPCD5B4crZunv2j32UsTd3Qd5PdG1R/example.wav" } } ); // To access the file URL: console.log(output.url()); //=> "http://example.com" // To write the file to disk: fs.writeFile("my-image.png", output);
To learn more, take a look at the guide on getting started with Node.js.
Install Replicate’s Python client library:pip install replicate
Import the client:import replicate
Run ttsds/amphion_maskgct using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
output = replicate.run( "ttsds/amphion_maskgct:619089fed0a4ac2651ca4a30734f4e1c3d7f67e21a282d56e6f2bfba187e1a28", input={ "text": "With tenure, Suzie'd have all the more leisure for yachting, but her publications are no good.", "language": "en", "speaker_reference": "https://replicate.delivery/pbxt/MNDu8UJR7zB1dZHG3UOPCD5B4crZunv2j32UsTd3Qd5PdG1R/example.wav" } ) print(output)
To learn more, take a look at the guide on getting started with Python.
Run ttsds/amphion_maskgct using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
curl -s -X POST \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ -H "Content-Type: application/json" \ -H "Prefer: wait" \ -d $'{ "version": "619089fed0a4ac2651ca4a30734f4e1c3d7f67e21a282d56e6f2bfba187e1a28", "input": { "text": "With tenure, Suzie\'d have all the more leisure for yachting, but her publications are no good.", "language": "en", "speaker_reference": "https://replicate.delivery/pbxt/MNDu8UJR7zB1dZHG3UOPCD5B4crZunv2j32UsTd3Qd5PdG1R/example.wav" } }' \ https://api.replicate.com/v1/predictions
To learn more, take a look at Replicate’s HTTP API reference docs.
Output
Video Player is loading.Current Time 00:00:000/Duration 00:00:000Loaded: 0%00:00:000Stream Type LIVERemaining Time -00:00:0001x- Chapters
- descriptions off, selected
- captions settings, opens captions settings dialog
- captions off, selected
This is a modal window.
Beginning of dialog window. Escape will cancel and close the window.
End of dialog window.
{ "completed_at": "2025-01-23T21:06:06.281635Z", "created_at": "2025-01-23T20:56:54.701000Z", "data_removed": false, "error": null, "id": "hp5d3hh0nnrmc0cmjkt9kev744", "input": { "text": "With tenure, Suzie'd have all the more leisure for yachting, but her publications are no good.", "language": "en", "speaker_reference": "https://replicate.delivery/pbxt/MNDu8UJR7zB1dZHG3UOPCD5B4crZunv2j32UsTd3Qd5PdG1R/example.wav" }, "logs": "Due to a bug fix in https://github.com/huggingface/transformers/pull/28687 transcription using a multilingual Whisper will default to language detection followed by transcription instead of translation to English.This might be a breaking change for your use case. If you want to instead always translate your audio to English, make sure to pass `language='en'`.\npredict semantic shape torch.Size([1, 354])", "metrics": { "predict_time": 13.097411221, "total_time": 551.580635 }, "output": "https://replicate.delivery/xezq/IgUpjMAipjrZMxgYvuWrDZyMZXuf9P7fZIT7u8PoS4e9JMQoA/test_pred.wav", "started_at": "2025-01-23T21:05:53.184224Z", "status": "succeeded", "urls": { "stream": "https://stream.replicate.com/v1/files/bsvm-2ovyohyfgt6g7itwxzmchyzvcamljh7vwvhprg2jtaaowaw57uwq", "get": "https://api.replicate.com/v1/predictions/hp5d3hh0nnrmc0cmjkt9kev744", "cancel": "https://api.replicate.com/v1/predictions/hp5d3hh0nnrmc0cmjkt9kev744/cancel" }, "version": "619089fed0a4ac2651ca4a30734f4e1c3d7f67e21a282d56e6f2bfba187e1a28" }
Generated inDue to a bug fix in https://github.com/huggingface/transformers/pull/28687 transcription using a multilingual Whisper will default to language detection followed by transcription instead of translation to English.This might be a breaking change for your use case. If you want to instead always translate your audio to English, make sure to pass `language='en'`. predict semantic shape torch.Size([1, 354])
Prediction
ttsds/amphion_maskgct:619089fed0a4ac2651ca4a30734f4e1c3d7f67e21a282d56e6f2bfba187e1a28IDte9rf2255nrme0cmjm2r17pxw8StatusSucceededSourceWebHardwareL40STotal durationCreatedInput
- text
- 张飞爱吃包子,李白游览华山,奇珍异兽满山坡。
- language
- zh
- speaker_reference
- Video Player is loading.Current Time 00:00:000/Duration 00:00:000Loaded: 0%00:00:000Stream Type LIVERemaining Time -00:00:0001x
- Chapters
- descriptions off, selected
- captions settings, opens captions settings dialog
- captions off, selected
This is a modal window.
Beginning of dialog window. Escape will cancel and close the window.
End of dialog window.
{ "text": "张飞爱吃包子,李白游览华山,奇珍异兽满山坡。", "language": "zh", "speaker_reference": "https://replicate.delivery/pbxt/MNEVKLUa63PiSeSVLrvmHw23VgiLJmVb5o27f62PDRCHV2tA/icl_20.wav" }
Install Replicate’s Node.js client library:npm install replicate
Import and set up the client:import Replicate from "replicate"; const replicate = new Replicate({ auth: process.env.REPLICATE_API_TOKEN, });
Run ttsds/amphion_maskgct using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
const output = await replicate.run( "ttsds/amphion_maskgct:619089fed0a4ac2651ca4a30734f4e1c3d7f67e21a282d56e6f2bfba187e1a28", { input: { text: "张飞爱吃包子,李白游览华山,奇珍异兽满山坡。", language: "zh", speaker_reference: "https://replicate.delivery/pbxt/MNEVKLUa63PiSeSVLrvmHw23VgiLJmVb5o27f62PDRCHV2tA/icl_20.wav" } } ); // To access the file URL: console.log(output.url()); //=> "http://example.com" // To write the file to disk: fs.writeFile("my-image.png", output);
To learn more, take a look at the guide on getting started with Node.js.
Install Replicate’s Python client library:pip install replicate
Import the client:import replicate
Run ttsds/amphion_maskgct using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
output = replicate.run( "ttsds/amphion_maskgct:619089fed0a4ac2651ca4a30734f4e1c3d7f67e21a282d56e6f2bfba187e1a28", input={ "text": "张飞爱吃包子,李白游览华山,奇珍异兽满山坡。", "language": "zh", "speaker_reference": "https://replicate.delivery/pbxt/MNEVKLUa63PiSeSVLrvmHw23VgiLJmVb5o27f62PDRCHV2tA/icl_20.wav" } ) print(output)
To learn more, take a look at the guide on getting started with Python.
Run ttsds/amphion_maskgct using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
curl -s -X POST \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ -H "Content-Type: application/json" \ -H "Prefer: wait" \ -d $'{ "version": "619089fed0a4ac2651ca4a30734f4e1c3d7f67e21a282d56e6f2bfba187e1a28", "input": { "text": "张飞爱吃包子,李白游览华山,奇珍异兽满山坡。", "language": "zh", "speaker_reference": "https://replicate.delivery/pbxt/MNEVKLUa63PiSeSVLrvmHw23VgiLJmVb5o27f62PDRCHV2tA/icl_20.wav" } }' \ https://api.replicate.com/v1/predictions
To learn more, take a look at Replicate’s HTTP API reference docs.
Output
Video Player is loading.Current Time 00:00:000/Duration 00:00:000Loaded: 0%00:00:000Stream Type LIVERemaining Time -00:00:0001x- Chapters
- descriptions off, selected
- captions settings, opens captions settings dialog
- captions off, selected
This is a modal window.
Beginning of dialog window. Escape will cancel and close the window.
End of dialog window.
{ "completed_at": "2025-01-23T21:21:29.670638Z", "created_at": "2025-01-23T21:15:38.157000Z", "data_removed": false, "error": null, "id": "te9rf2255nrme0cmjm2r17pxw8", "input": { "text": "张飞爱吃包子,李白游览华山,奇珍异兽满山坡。", "language": "zh", "speaker_reference": "https://replicate.delivery/pbxt/MNEVKLUa63PiSeSVLrvmHw23VgiLJmVb5o27f62PDRCHV2tA/icl_20.wav" }, "logs": "Due to a bug fix in https://github.com/huggingface/transformers/pull/28687 transcription using a multilingual Whisper will default to language detection followed by transcription instead of translation to English.This might be a breaking change for your use case. If you want to instead always translate your audio to English, make sure to pass `language='en'`.\nBuilding prefix dict from the default dictionary ...\nDumping model to file cache /tmp/jieba.cache\nLoading model cost 0.394 seconds.\nPrefix dict has been built successfully.\n/Amphion/./models/tts/maskgct/g2p/g2p/chinese_model_g2p.py:100: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:245.)\nbatch_label_starts = torch.tensor(batch_label_starts, dtype=torch.long)\npredict semantic shape torch.Size([1, 395])", "metrics": { "predict_time": 16.411000301, "total_time": 351.513638 }, "output": "https://replicate.delivery/xezq/P7vyZn4hBdqCOd1MdbCeYXLRmUYFzY4mplQPm208ELpsJDEKA/test_pred.wav", "started_at": "2025-01-23T21:21:13.259638Z", "status": "succeeded", "urls": { "stream": "https://stream.replicate.com/v1/files/bsvm-vklmjg4dvn36zwmmmuuhkfil7qqdvdwcap34h3iahume4lhkbijq", "get": "https://api.replicate.com/v1/predictions/te9rf2255nrme0cmjm2r17pxw8", "cancel": "https://api.replicate.com/v1/predictions/te9rf2255nrme0cmjm2r17pxw8/cancel" }, "version": "619089fed0a4ac2651ca4a30734f4e1c3d7f67e21a282d56e6f2bfba187e1a28" }
Generated inDue to a bug fix in https://github.com/huggingface/transformers/pull/28687 transcription using a multilingual Whisper will default to language detection followed by transcription instead of translation to English.This might be a breaking change for your use case. If you want to instead always translate your audio to English, make sure to pass `language='en'`. Building prefix dict from the default dictionary ... Dumping model to file cache /tmp/jieba.cache Loading model cost 0.394 seconds. Prefix dict has been built successfully. /Amphion/./models/tts/maskgct/g2p/g2p/chinese_model_g2p.py:100: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:245.) batch_label_starts = torch.tensor(batch_label_starts, dtype=torch.long) predict semantic shape torch.Size([1, 395])
Prediction
ttsds/amphion_maskgct:7bd535bef57f4ea7e45e45c73fb2fda847b8ebd27df6c9550f5ba1a1742a66f5IDp7g6hdc2ynrma0cmp5sa22ng4gStatusSucceededSourceWebHardwareL40STotal durationCreatedInput
- text
- With tenure, Suzie'd have all the more leisure for yachting, but her publications are no good.
- language
- en
- text_reference
- and keeping eternity before the eyes, though much
- speaker_reference
- Video Player is loading.Current Time 00:00:000/Duration 00:00:000Loaded: 0%00:00:000Stream Type LIVERemaining Time -00:00:0001x
- Chapters
- descriptions off, selected
- captions settings, opens captions settings dialog
- captions off, selected
This is a modal window.
Beginning of dialog window. Escape will cancel and close the window.
End of dialog window.
{ "text": "With tenure, Suzie'd have all the more leisure for yachting, but her publications are no good.", "language": "en", "text_reference": "and keeping eternity before the eyes, though much", "speaker_reference": "https://replicate.delivery/pbxt/MNDu8UJR7zB1dZHG3UOPCD5B4crZunv2j32UsTd3Qd5PdG1R/example.wav" }
Install Replicate’s Node.js client library:npm install replicate
Import and set up the client:import Replicate from "replicate"; const replicate = new Replicate({ auth: process.env.REPLICATE_API_TOKEN, });
Run ttsds/amphion_maskgct using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
const output = await replicate.run( "ttsds/amphion_maskgct:7bd535bef57f4ea7e45e45c73fb2fda847b8ebd27df6c9550f5ba1a1742a66f5", { input: { text: "With tenure, Suzie'd have all the more leisure for yachting, but her publications are no good.", language: "en", text_reference: "and keeping eternity before the eyes, though much", speaker_reference: "https://replicate.delivery/pbxt/MNDu8UJR7zB1dZHG3UOPCD5B4crZunv2j32UsTd3Qd5PdG1R/example.wav" } } ); // To access the file URL: console.log(output.url()); //=> "http://example.com" // To write the file to disk: fs.writeFile("my-image.png", output);
To learn more, take a look at the guide on getting started with Node.js.
Install Replicate’s Python client library:pip install replicate
Import the client:import replicate
Run ttsds/amphion_maskgct using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
output = replicate.run( "ttsds/amphion_maskgct:7bd535bef57f4ea7e45e45c73fb2fda847b8ebd27df6c9550f5ba1a1742a66f5", input={ "text": "With tenure, Suzie'd have all the more leisure for yachting, but her publications are no good.", "language": "en", "text_reference": "and keeping eternity before the eyes, though much", "speaker_reference": "https://replicate.delivery/pbxt/MNDu8UJR7zB1dZHG3UOPCD5B4crZunv2j32UsTd3Qd5PdG1R/example.wav" } ) print(output)
To learn more, take a look at the guide on getting started with Python.
Run ttsds/amphion_maskgct using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
curl -s -X POST \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ -H "Content-Type: application/json" \ -H "Prefer: wait" \ -d $'{ "version": "7bd535bef57f4ea7e45e45c73fb2fda847b8ebd27df6c9550f5ba1a1742a66f5", "input": { "text": "With tenure, Suzie\'d have all the more leisure for yachting, but her publications are no good.", "language": "en", "text_reference": "and keeping eternity before the eyes, though much", "speaker_reference": "https://replicate.delivery/pbxt/MNDu8UJR7zB1dZHG3UOPCD5B4crZunv2j32UsTd3Qd5PdG1R/example.wav" } }' \ https://api.replicate.com/v1/predictions
To learn more, take a look at Replicate’s HTTP API reference docs.
Output
Video Player is loading.Current Time 00:00:000/Duration 00:00:000Loaded: 0%00:00:000Stream Type LIVERemaining Time -00:00:0001x- Chapters
- descriptions off, selected
- captions settings, opens captions settings dialog
- captions off, selected
This is a modal window.
Beginning of dialog window. Escape will cancel and close the window.
End of dialog window.
{ "completed_at": "2025-01-29T09:49:59.555311Z", "created_at": "2025-01-29T09:44:19.445000Z", "data_removed": false, "error": null, "id": "p7g6hdc2ynrma0cmp5sa22ng4g", "input": { "text": "With tenure, Suzie'd have all the more leisure for yachting, but her publications are no good.", "language": "en", "text_reference": "and keeping eternity before the eyes, though much", "speaker_reference": "https://replicate.delivery/pbxt/MNDu8UJR7zB1dZHG3UOPCD5B4crZunv2j32UsTd3Qd5PdG1R/example.wav" }, "logs": "predict semantic shape torch.Size([1, 354])", "metrics": { "predict_time": 14.402907103, "total_time": 340.110311 }, "output": "https://replicate.delivery/xezq/ySYi7BRpdOZUCpRFbpuiJCK4vHFJf1yLhmhiwWub0meHv6JUA/test_pred.wav", "started_at": "2025-01-29T09:49:45.152404Z", "status": "succeeded", "urls": { "stream": "https://stream.replicate.com/v1/files/bsvm-tactxvuchns4qload3bl375w6a5ejagr6dededtipgtttp7e7wnq", "get": "https://api.replicate.com/v1/predictions/p7g6hdc2ynrma0cmp5sa22ng4g", "cancel": "https://api.replicate.com/v1/predictions/p7g6hdc2ynrma0cmp5sa22ng4g/cancel" }, "version": "7bd535bef57f4ea7e45e45c73fb2fda847b8ebd27df6c9550f5ba1a1742a66f5" }
Generated inpredict semantic shape torch.Size([1, 354])
Want to make some of these yourself?
Run this model