lee101 / guided-text-to-speech

voice: A female speaker with a slightly low-pitched, quite monotone voice delivers her words at a slightly faster-than-average pace in a confined space with very clear audio.
prompt: hi whats the weather?

{
  "voice": "A female speaker with a slightly low-pitched, quite monotone voice delivers her words at a slightly faster-than-average pace in a confined space with very clear audio.",
  "prompt": "hi whats the weather?"
}

Install Replicate’s Node.js client library:

npm install replicate

Import and set up the client:

import Replicate from "replicate";
import fs from "node:fs";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run lee101/guided-text-to-speech using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "lee101/guided-text-to-speech:02976eb16df7499807d1383125f1ba12809acadae834e08eeeeff3217062fd98",
  {
    input: {
      voice: "A female speaker with a slightly low-pitched, quite monotone voice delivers her words at a slightly faster-than-average pace in a confined space with very clear audio.",
      prompt: "hi whats the weather?"
    }
  }
);

// To access the file URL:
console.log(output.url()); //=> "http://example.com"

// To write the file to disk:
fs.writeFile("my-image.png", output);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Import the client:

import replicate

Run lee101/guided-text-to-speech using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "lee101/guided-text-to-speech:02976eb16df7499807d1383125f1ba12809acadae834e08eeeeff3217062fd98",
    input={
        "voice": "A female speaker with a slightly low-pitched, quite monotone voice delivers her words at a slightly faster-than-average pace in a confined space with very clear audio.",
        "prompt": "hi whats the weather?"
    }
)
print(output)

To learn more, take a look at the guide on getting started with Python.

Run lee101/guided-text-to-speech using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "lee101/guided-text-to-speech:02976eb16df7499807d1383125f1ba12809acadae834e08eeeeff3217062fd98",
    "input": {
      "voice": "A female speaker with a slightly low-pitched, quite monotone voice delivers her words at a slightly faster-than-average pace in a confined space with very clear audio.",
      "prompt": "hi whats the weather?"
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

Output

Video Player is loading.

Current Time 00:00:000

Duration 00:00:000

Loaded: 0%

Stream Type LIVE

Remaining Time 00:00:000

{
  "completed_at": "2024-05-26T22:09:51.380672Z",
  "created_at": "2024-05-26T22:06:21.978000Z",
  "data_removed": false,
  "error": null,
  "id": "k1wsdf5z39rg80cfpvr9befaxw",
  "input": {
    "voice": "A female speaker with a slightly low-pitched, quite monotone voice delivers her words at a slightly faster-than-average pace in a confined space with very clear audio.",
    "prompt": "hi whats the weather?"
  },
  "logs": "Using the model-agnostic default `max_length` (=2580) to control the generation length. We recommend setting `max_new_tokens` to control the maximum length of the generation.\nCalling `sample` directly is deprecated and will be removed in v4.41. Use `generate` or a custom generation loop instead.\n--- Logging error ---\nTraceback (most recent call last):\nFile \"/root/.pyenv/versions/3.11.9/lib/python3.11/logging/__init__.py\", line 1110, in emit\nmsg = self.format(record)\n^^^^^^^^^^^^^^^^^^^\nFile \"/root/.pyenv/versions/3.11.9/lib/python3.11/logging/__init__.py\", line 953, in format\nreturn fmt.format(record)\n^^^^^^^^^^^^^^^^^^\nFile \"/root/.pyenv/versions/3.11.9/lib/python3.11/logging/__init__.py\", line 687, in format\nrecord.message = record.getMessage()\n^^^^^^^^^^^^^^^^^^^\nFile \"/root/.pyenv/versions/3.11.9/lib/python3.11/logging/__init__.py\", line 377, in getMessage\nmsg = msg % self.args\n~~~~^~~~~~~~~~~\nTypeError: not all arguments converted during string formatting\nCall stack:\nFile \"<string>\", line 1, in <module>\nFile \"/root/.pyenv/versions/3.11.9/lib/python3.11/multiprocessing/spawn.py\", line 122, in spawn_main\nexitcode = _main(fd, parent_sentinel)\nFile \"/root/.pyenv/versions/3.11.9/lib/python3.11/multiprocessing/spawn.py\", line 135, in _main\nreturn self._bootstrap(parent_sentinel)\nFile \"/root/.pyenv/versions/3.11.9/lib/python3.11/multiprocessing/process.py\", line 314, in _bootstrap\nself.run()\nFile \"/root/.pyenv/versions/3.11.9/lib/python3.11/site-packages/cog/server/worker.py\", line 179, in run\nself._loop()\nFile \"/root/.pyenv/versions/3.11.9/lib/python3.11/site-packages/cog/server/worker.py\", line 211, in _loop\nself._predict(ev.payload)\nFile \"/root/.pyenv/versions/3.11.9/lib/python3.11/site-packages/cog/server/worker.py\", line 221, in _predict\nresult = predict(**payload)\nFile \"/src/predict.py\", line 15, in predict\nsample_rate, audio_arr = gen_tts(prompt, voice)\nFile \"/src/parlerlib.py\", line 91, in gen_tts\ngeneration = model.generate(\nFile \"/root/.pyenv/versions/3.11.9/lib/python3.11/site-packages/torch/utils/_contextlib.py\", line 115, in decorate_context\nreturn func(*args, **kwargs)\nFile \"/root/.pyenv/versions/3.11.9/lib/python3.11/site-packages/parler_tts/modeling_parler_tts.py\", line 2608, in generate\noutputs = self.sample(\nFile \"/root/.pyenv/versions/3.11.9/lib/python3.11/site-packages/transformers/generation/utils.py\", line 2584, in sample\nreturn self._sample(*args, **kwargs)\nFile \"/root/.pyenv/versions/3.11.9/lib/python3.11/site-packages/transformers/generation/utils.py\", line 2730, in _sample\nlogger.warning_once(\nFile \"/root/.pyenv/versions/3.11.9/lib/python3.11/site-packages/transformers/utils/logging.py\", line 329, in warning_once\nself.warning(*args, **kwargs)\nMessage: '`eos_token_id` is deprecated in this function and will be removed in v4.41, use `stopping_criteria=StoppingCriteriaList([EosTokenCriteria(eos_token_id=eos_token_id)])` instead. Otherwise make sure to set `model.generation_config.eos_token_id`'\nArguments: (<class 'FutureWarning'>,)",
  "metrics": {
    "predict_time": 7.724552,
    "total_time": 209.402672
  },
  "output": "https://replicate.delivery/pbxt/3O8BpeJ7v1T9ciiflflbB6ZVHuJrcog1lUspjYNn5GzfSZhLB/tmpy4i_v3pi.wav",
  "started_at": "2024-05-26T22:09:43.656120Z",
  "status": "succeeded",
  "urls": {
    "get": "https://api.replicate.com/v1/predictions/k1wsdf5z39rg80cfpvr9befaxw",
    "cancel": "https://api.replicate.com/v1/predictions/k1wsdf5z39rg80cfpvr9befaxw/cancel"
  },
  "version": "02976eb16df7499807d1383125f1ba12809acadae834e08eeeeff3217062fd98"
}

Generated in

7.7 seconds

Tweak it Share Report View full prediction

Using the model-agnostic default `max_length` (=2580) to control the generation length. We recommend setting `max_new_tokens` to control the maximum length of the generation.
Calling `sample` directly is deprecated and will be removed in v4.41. Use `generate` or a custom generation loop instead.
--- Logging error ---
Traceback (most recent call last):
File "/root/.pyenv/versions/3.11.9/lib/python3.11/logging/__init__.py", line 1110, in emit
msg = self.format(record)
^^^^^^^^^^^^^^^^^^^
File "/root/.pyenv/versions/3.11.9/lib/python3.11/logging/__init__.py", line 953, in format
return fmt.format(record)
^^^^^^^^^^^^^^^^^^
File "/root/.pyenv/versions/3.11.9/lib/python3.11/logging/__init__.py", line 687, in format
record.message = record.getMessage()
^^^^^^^^^^^^^^^^^^^
File "/root/.pyenv/versions/3.11.9/lib/python3.11/logging/__init__.py", line 377, in getMessage
msg = msg % self.args
~~~~^~~~~~~~~~~
TypeError: not all arguments converted during string formatting
Call stack:
File "<string>", line 1, in <module>
File "/root/.pyenv/versions/3.11.9/lib/python3.11/multiprocessing/spawn.py", line 122, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "/root/.pyenv/versions/3.11.9/lib/python3.11/multiprocessing/spawn.py", line 135, in _main
return self._bootstrap(parent_sentinel)
File "/root/.pyenv/versions/3.11.9/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/root/.pyenv/versions/3.11.9/lib/python3.11/site-packages/cog/server/worker.py", line 179, in run
self._loop()
File "/root/.pyenv/versions/3.11.9/lib/python3.11/site-packages/cog/server/worker.py", line 211, in _loop
self._predict(ev.payload)
File "/root/.pyenv/versions/3.11.9/lib/python3.11/site-packages/cog/server/worker.py", line 221, in _predict
result = predict(**payload)
File "/src/predict.py", line 15, in predict
sample_rate, audio_arr = gen_tts(prompt, voice)
File "/src/parlerlib.py", line 91, in gen_tts
generation = model.generate(
File "/root/.pyenv/versions/3.11.9/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/root/.pyenv/versions/3.11.9/lib/python3.11/site-packages/parler_tts/modeling_parler_tts.py", line 2608, in generate
outputs = self.sample(
File "/root/.pyenv/versions/3.11.9/lib/python3.11/site-packages/transformers/generation/utils.py", line 2584, in sample
return self._sample(*args, **kwargs)
File "/root/.pyenv/versions/3.11.9/lib/python3.11/site-packages/transformers/generation/utils.py", line 2730, in _sample
logger.warning_once(
File "/root/.pyenv/versions/3.11.9/lib/python3.11/site-packages/transformers/utils/logging.py", line 329, in warning_once
self.warning(*args, **kwargs)
Message: '`eos_token_id` is deprecated in this function and will be removed in v4.41, use `stopping_criteria=StoppingCriteriaList([EosTokenCriteria(eos_token_id=eos_token_id)])` instead. Otherwise make sure to set `model.generation_config.eos_token_id`'
Arguments: (<class 'FutureWarning'>,)

Prediction

lee101/guided-text-to-speech:fc0617a394340824a7dd1aa78f76e92c061449abd48e67ee9dbe30a6448c8be2

Model

lee101/guided-text-to-speech:fc0617a3

my3t4g8q0xrg80cfsefbt2n48r

Status

Succeeded

Source

Web

Hardware

Total duration

4m 42s

Created

about 1 year ago by @lee101

Input

voice: A male speaker with a low-pitched narrator story voice, expressive energetic voice delivers words fast pace in a open space with very clear audio.
prompt: What are you wanting to chat about today?

{
  "voice": "A male speaker with a low-pitched narrator story voice, expressive energetic voice delivers words fast pace in a open space with very clear audio.",
  "prompt": "What are you wanting to chat about today?"
}

Install Replicate’s Node.js client library:

npm install replicate

Import and set up the client:

import Replicate from "replicate";
import fs from "node:fs";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run lee101/guided-text-to-speech using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "lee101/guided-text-to-speech:fc0617a394340824a7dd1aa78f76e92c061449abd48e67ee9dbe30a6448c8be2",
  {
    input: {
      voice: "A male speaker with a low-pitched narrator story voice, expressive energetic voice delivers words fast pace in a open space with very clear audio.",
      prompt: "What are you wanting to chat about today?"
    }
  }
);

// To access the file URL:
console.log(output.url()); //=> "http://example.com"

// To write the file to disk:
fs.writeFile("my-image.png", output);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Import the client:

import replicate

Run lee101/guided-text-to-speech using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "lee101/guided-text-to-speech:fc0617a394340824a7dd1aa78f76e92c061449abd48e67ee9dbe30a6448c8be2",
    input={
        "voice": "A male speaker with a low-pitched narrator story voice, expressive energetic voice delivers words fast pace in a open space with very clear audio.",
        "prompt": "What are you wanting to chat about today?"
    }
)
print(output)

To learn more, take a look at the guide on getting started with Python.

Run lee101/guided-text-to-speech using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "lee101/guided-text-to-speech:fc0617a394340824a7dd1aa78f76e92c061449abd48e67ee9dbe30a6448c8be2",
    "input": {
      "voice": "A male speaker with a low-pitched narrator story voice, expressive energetic voice delivers words fast pace in a open space with very clear audio.",
      "prompt": "What are you wanting to chat about today?"
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

Output

Video Player is loading.

Current Time 00:00:000

Duration 00:00:000

Loaded: 0%

Stream Type LIVE

Remaining Time 00:00:000

{
  "completed_at": "2024-05-30T22:32:48.981541Z",
  "created_at": "2024-05-30T22:28:06.535000Z",
  "data_removed": false,
  "error": null,
  "id": "my3t4g8q0xrg80cfsefbt2n48r",
  "input": {
    "voice": "A male speaker with a low-pitched narrator story voice, expressive energetic voice delivers words fast pace in a open space with very clear audio.",
    "prompt": "What are you wanting to chat about today?"
  },
  "logs": "Using the model-agnostic default `max_length` (=2580) to control the generation length. We recommend setting `max_new_tokens` to control the maximum length of the generation.\nCalling `sample` directly is deprecated and will be removed in v4.41. Use `generate` or a custom generation loop instead.\n--- Logging error ---\nTraceback (most recent call last):\nFile \"/root/.pyenv/versions/3.11.9/lib/python3.11/logging/__init__.py\", line 1110, in emit\nmsg = self.format(record)\n^^^^^^^^^^^^^^^^^^^\nFile \"/root/.pyenv/versions/3.11.9/lib/python3.11/logging/__init__.py\", line 953, in format\nreturn fmt.format(record)\n^^^^^^^^^^^^^^^^^^\nFile \"/root/.pyenv/versions/3.11.9/lib/python3.11/logging/__init__.py\", line 687, in format\nrecord.message = record.getMessage()\n^^^^^^^^^^^^^^^^^^^\nFile \"/root/.pyenv/versions/3.11.9/lib/python3.11/logging/__init__.py\", line 377, in getMessage\nmsg = msg % self.args\n~~~~^~~~~~~~~~~\nTypeError: not all arguments converted during string formatting\nCall stack:\nFile \"<string>\", line 1, in <module>\nFile \"/root/.pyenv/versions/3.11.9/lib/python3.11/multiprocessing/spawn.py\", line 122, in spawn_main\nexitcode = _main(fd, parent_sentinel)\nFile \"/root/.pyenv/versions/3.11.9/lib/python3.11/multiprocessing/spawn.py\", line 135, in _main\nreturn self._bootstrap(parent_sentinel)\nFile \"/root/.pyenv/versions/3.11.9/lib/python3.11/multiprocessing/process.py\", line 314, in _bootstrap\nself.run()\nFile \"/root/.pyenv/versions/3.11.9/lib/python3.11/site-packages/cog/server/worker.py\", line 179, in run\nself._loop()\nFile \"/root/.pyenv/versions/3.11.9/lib/python3.11/site-packages/cog/server/worker.py\", line 211, in _loop\nself._predict(ev.payload)\nFile \"/root/.pyenv/versions/3.11.9/lib/python3.11/site-packages/cog/server/worker.py\", line 221, in _predict\nresult = predict(**payload)\nFile \"/src/predict.py\", line 14, in predict\nsample_rate, audio_arr = gen_tts(prompt, voice)\nFile \"/src/parlerlib.py\", line 90, in gen_tts\ngeneration = model.generate(\nFile \"/root/.pyenv/versions/3.11.9/lib/python3.11/site-packages/torch/utils/_contextlib.py\", line 115, in decorate_context\nreturn func(*args, **kwargs)\nFile \"/root/.pyenv/versions/3.11.9/lib/python3.11/site-packages/parler_tts/modeling_parler_tts.py\", line 2608, in generate\noutputs = self.sample(\nFile \"/root/.pyenv/versions/3.11.9/lib/python3.11/site-packages/transformers/generation/utils.py\", line 2584, in sample\nreturn self._sample(*args, **kwargs)\nFile \"/root/.pyenv/versions/3.11.9/lib/python3.11/site-packages/transformers/generation/utils.py\", line 2730, in _sample\nlogger.warning_once(\nFile \"/root/.pyenv/versions/3.11.9/lib/python3.11/site-packages/transformers/utils/logging.py\", line 329, in warning_once\nself.warning(*args, **kwargs)\nMessage: '`eos_token_id` is deprecated in this function and will be removed in v4.41, use `stopping_criteria=StoppingCriteriaList([EosTokenCriteria(eos_token_id=eos_token_id)])` instead. Otherwise make sure to set `model.generation_config.eos_token_id`'\nArguments: (<class 'FutureWarning'>,)",
  "metrics": {
    "predict_time": 23.162743,
    "total_time": 282.446541
  },
  "output": "https://replicate.delivery/czjl/Dkba34f2BQzjVCHHa2uVbIjye7UvnlJeXrcyMdtEMmJhEWzlA/tmpqr0mdm_w.mp3",
  "started_at": "2024-05-30T22:32:25.818798Z",
  "status": "succeeded",
  "urls": {
    "get": "https://api.replicate.com/v1/predictions/my3t4g8q0xrg80cfsefbt2n48r",
    "cancel": "https://api.replicate.com/v1/predictions/my3t4g8q0xrg80cfsefbt2n48r/cancel"
  },
  "version": "fc0617a394340824a7dd1aa78f76e92c061449abd48e67ee9dbe30a6448c8be2"
}

Generated in

23.2 seconds

Tweak it Share Report View full prediction

Using the model-agnostic default `max_length` (=2580) to control the generation length. We recommend setting `max_new_tokens` to control the maximum length of the generation.
Calling `sample` directly is deprecated and will be removed in v4.41. Use `generate` or a custom generation loop instead.
--- Logging error ---
Traceback (most recent call last):
File "/root/.pyenv/versions/3.11.9/lib/python3.11/logging/__init__.py", line 1110, in emit
msg = self.format(record)
^^^^^^^^^^^^^^^^^^^
File "/root/.pyenv/versions/3.11.9/lib/python3.11/logging/__init__.py", line 953, in format
return fmt.format(record)
^^^^^^^^^^^^^^^^^^
File "/root/.pyenv/versions/3.11.9/lib/python3.11/logging/__init__.py", line 687, in format
record.message = record.getMessage()
^^^^^^^^^^^^^^^^^^^
File "/root/.pyenv/versions/3.11.9/lib/python3.11/logging/__init__.py", line 377, in getMessage
msg = msg % self.args
~~~~^~~~~~~~~~~
TypeError: not all arguments converted during string formatting
Call stack:
File "<string>", line 1, in <module>
File "/root/.pyenv/versions/3.11.9/lib/python3.11/multiprocessing/spawn.py", line 122, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "/root/.pyenv/versions/3.11.9/lib/python3.11/multiprocessing/spawn.py", line 135, in _main
return self._bootstrap(parent_sentinel)
File "/root/.pyenv/versions/3.11.9/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/root/.pyenv/versions/3.11.9/lib/python3.11/site-packages/cog/server/worker.py", line 179, in run
self._loop()
File "/root/.pyenv/versions/3.11.9/lib/python3.11/site-packages/cog/server/worker.py", line 211, in _loop
self._predict(ev.payload)
File "/root/.pyenv/versions/3.11.9/lib/python3.11/site-packages/cog/server/worker.py", line 221, in _predict
result = predict(**payload)
File "/src/predict.py", line 14, in predict
sample_rate, audio_arr = gen_tts(prompt, voice)
File "/src/parlerlib.py", line 90, in gen_tts
generation = model.generate(
File "/root/.pyenv/versions/3.11.9/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/root/.pyenv/versions/3.11.9/lib/python3.11/site-packages/parler_tts/modeling_parler_tts.py", line 2608, in generate
outputs = self.sample(
File "/root/.pyenv/versions/3.11.9/lib/python3.11/site-packages/transformers/generation/utils.py", line 2584, in sample
return self._sample(*args, **kwargs)
File "/root/.pyenv/versions/3.11.9/lib/python3.11/site-packages/transformers/generation/utils.py", line 2730, in _sample
logger.warning_once(
File "/root/.pyenv/versions/3.11.9/lib/python3.11/site-packages/transformers/utils/logging.py", line 329, in warning_once
self.warning(*args, **kwargs)
Message: '`eos_token_id` is deprecated in this function and will be removed in v4.41, use `stopping_criteria=StoppingCriteriaList([EosTokenCriteria(eos_token_id=eos_token_id)])` instead. Otherwise make sure to set `model.generation_config.eos_token_id`'
Arguments: (<class 'FutureWarning'>,)

Want to make some of these yourself?

Run this model

lee101 / guided-text-to-speech

Prediction

Input

Output

Prediction

Input

Output

Logs (k1wsdf5z39rg80cfpvr9befaxw)

Logs (my3t4g8q0xrg80cfsefbt2n48r)