suno-ai / bark

🔊 Text-Prompted Generative Audio Model

Cold

Public
299.6K runs
T4
GitHub
License

Run with an API

Playground API Examples README Versions

Input

prompt

string

Shift + Return to add a new line

Hello, my name is Suno. And, uh — and I like pizza. [laughs] But I also have other interests such as playing tic tac toe.Hello, my name is Suno. And, uh — and I like pizza. [laughs] But I also have other interests such as playing tic tac toe.

Input prompt

Default: "Hello, my name is Suno. And, uh — and I like pizza. [laughs] But I also have other interests such as playing tic tac toe."

history_prompt

string

history choice for audio cloning, choose from the list

custom_history_prompt

file

Provide your own .npz file with history choice for audio cloning, this will override the previous history_prompt setting

text_temp

number

generation temperature (1.0 more diverse, 0.0 more conservative)

Default: 0.7

waveform_temp

number

generation temperature (1.0 more diverse, 0.0 more conservative)

Default: 0.7

output_full

boolean

return full generation as a .npz file to be used as a history prompt

Default: false

Run this model in Node.js with one line of code:

npx create-replicate --model=suno-ai/bark

or set up a project from scratch

Install Replicate’s Node.js client library:

npm install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import and set up the client:

import Replicate from "replicate";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run suno-ai/bark using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "suno-ai/bark:b76242b40d67c76ab6742e987628a2a9ac019e11d56ab96c4e91ce03b79b2787",
  {
    input: {
      prompt: "Hello, my name is Suno. And, uh — and I like pizza. [laughs] But I also have other interests such as playing tic tac toe.",
      text_temp: 0.7,
      output_full: false,
      waveform_temp: 0.7,
      history_prompt: "announcer"
    }
  }
);

console.log(output);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import the client:

import replicate

Run suno-ai/bark using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "suno-ai/bark:b76242b40d67c76ab6742e987628a2a9ac019e11d56ab96c4e91ce03b79b2787",
    input={
        "prompt": "Hello, my name is Suno. And, uh — and I like pizza. [laughs] But I also have other interests such as playing tic tac toe.",
        "text_temp": 0.7,
        "output_full": False,
        "waveform_temp": 0.7,
        "history_prompt": "announcer"
    }
)
print(output)

To learn more, take a look at the guide on getting started with Python.

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Run suno-ai/bark using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "suno-ai/bark:b76242b40d67c76ab6742e987628a2a9ac019e11d56ab96c4e91ce03b79b2787",
    "input": {
      "prompt": "Hello, my name is Suno. And, uh — and I like pizza. [laughs] But I also have other interests such as playing tic tac toe.",
      "text_temp": 0.7,
      "output_full": false,
      "waveform_temp": 0.7,
      "history_prompt": "announcer"
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

Output

Video Player is loading.

Current Time 00:00:000

Duration 00:00:000

Loaded: 0%

Stream Type LIVE

Remaining Time 00:00:000

{
  "completed_at": "2023-04-25T22:18:59.625638Z",
  "created_at": "2023-04-25T22:11:26.774980Z",
  "data_removed": false,
  "error": null,
  "id": "ngk3yp5omvdsvkcdoljxs2m4ra",
  "input": {
    "prompt": "Hello, my name is Suno. And, uh — and I like pizza. [laughs] But I also have other interests such as playing tic tac toe."
  },
  "logs": "0%|          | 0/100 [00:00<?, ?it/s]\n  1%|          | 1/100 [00:00<00:27,  3.65it/s]\n  3%|▎         | 3/100 [00:00<00:13,  7.12it/s]\n  5%|▌         | 5/100 [00:00<00:10,  8.90it/s]\n  7%|▋         | 7/100 [00:00<00:09,  9.98it/s]\n  9%|▉         | 9/100 [00:00<00:08, 10.54it/s]\n 11%|█         | 11/100 [00:01<00:08, 11.09it/s]\n 13%|█▎        | 13/100 [00:01<00:07, 11.16it/s]\n 15%|█▌        | 15/100 [00:01<00:07, 11.32it/s]\n 17%|█▋        | 17/100 [00:01<00:07, 11.66it/s]\n 19%|█▉        | 19/100 [00:01<00:06, 11.69it/s]\n 21%|██        | 21/100 [00:01<00:06, 12.05it/s]\n 23%|██▎       | 23/100 [00:02<00:06, 12.17it/s]\n 25%|██▌       | 25/100 [00:02<00:06, 11.89it/s]\n 27%|██▋       | 27/100 [00:02<00:06, 11.64it/s]\n 29%|██▉       | 29/100 [00:02<00:06, 11.51it/s]\n 31%|███       | 31/100 [00:02<00:05, 11.61it/s]\n 33%|███▎      | 33/100 [00:02<00:05, 11.88it/s]\n 35%|███▌      | 35/100 [00:03<00:05, 11.82it/s]\n 37%|███▋      | 37/100 [00:03<00:05, 11.75it/s]\n 39%|███▉      | 39/100 [00:03<00:05, 11.63it/s]\n 41%|████      | 41/100 [00:03<00:05, 11.64it/s]\n 43%|████▎     | 43/100 [00:03<00:04, 11.79it/s]\n 45%|████▌     | 45/100 [00:04<00:04, 11.94it/s]\n 47%|████▋     | 47/100 [00:04<00:04, 11.83it/s]\n 49%|████▉     | 49/100 [00:04<00:04, 11.89it/s]\n 51%|█████     | 51/100 [00:04<00:04, 11.70it/s]\n 53%|█████▎    | 53/100 [00:04<00:04, 11.56it/s]\n 55%|█████▌    | 55/100 [00:04<00:03, 11.64it/s]\n 57%|█████▋    | 57/100 [00:05<00:03, 11.63it/s]\n 59%|█████▉    | 59/100 [00:05<00:03, 11.38it/s]\n 61%|██████    | 61/100 [00:05<00:03, 11.38it/s]\n 63%|██████▎   | 63/100 [00:05<00:03, 11.07it/s]\n 65%|██████▌   | 65/100 [00:05<00:03, 11.19it/s]\n 67%|██████▋   | 67/100 [00:05<00:02, 11.28it/s]\n 69%|██████▉   | 69/100 [00:06<00:02, 11.08it/s]\n 71%|███████   | 71/100 [00:06<00:02, 11.11it/s]\n 73%|███████▎  | 73/100 [00:06<00:02, 10.91it/s]\n 75%|███████▌  | 75/100 [00:06<00:02, 10.78it/s]\n 77%|███████▋  | 77/100 [00:06<00:02, 10.83it/s]\n 79%|███████▉  | 79/100 [00:07<00:01, 10.86it/s]\n 81%|████████  | 81/100 [00:07<00:01, 10.64it/s]\n 83%|████████▎ | 83/100 [00:07<00:01, 10.67it/s]\n 85%|████████▌ | 85/100 [00:07<00:01, 10.57it/s]\n 87%|████████▋ | 87/100 [00:07<00:01, 10.34it/s]\n 89%|████████▉ | 89/100 [00:08<00:01, 10.39it/s]\n 91%|█████████ | 91/100 [00:08<00:00, 10.10it/s]\n 93%|█████████▎| 93/100 [00:08<00:00, 10.06it/s]\n100%|██████████| 100/100 [00:08<00:00, 20.32it/s]\n100%|██████████| 100/100 [00:08<00:00, 11.69it/s]\n  0%|          | 0/36 [00:00<?, ?it/s]\n  3%|▎         | 1/36 [00:00<00:23,  1.48it/s]\n  6%|▌         | 2/36 [00:01<00:22,  1.48it/s]\n  8%|▊         | 3/36 [00:02<00:22,  1.45it/s]\n 11%|█         | 4/36 [00:02<00:22,  1.43it/s]\n 14%|█▍        | 5/36 [00:03<00:21,  1.41it/s]\n 17%|█▋        | 6/36 [00:04<00:22,  1.36it/s]\n 19%|█▉        | 7/36 [00:05<00:21,  1.34it/s]\n 22%|██▏       | 8/36 [00:05<00:21,  1.30it/s]\n 25%|██▌       | 9/36 [00:06<00:21,  1.25it/s]\n 28%|██▊       | 10/36 [00:07<00:21,  1.23it/s]\n 31%|███       | 11/36 [00:08<00:21,  1.19it/s]\n 33%|███▎      | 12/36 [00:09<00:20,  1.15it/s]\n 36%|███▌      | 13/36 [00:10<00:20,  1.13it/s]\n 39%|███▉      | 14/36 [00:11<00:19,  1.12it/s]\n 42%|████▏     | 15/36 [00:12<00:18,  1.11it/s]\n 44%|████▍     | 16/36 [00:13<00:18,  1.10it/s]\n 47%|████▋     | 17/36 [00:14<00:17,  1.10it/s]\n 50%|█████     | 18/36 [00:14<00:16,  1.10it/s]\n 53%|█████▎    | 19/36 [00:15<00:15,  1.09it/s]\n 56%|█████▌    | 20/36 [00:16<00:14,  1.08it/s]\n 58%|█████▊    | 21/36 [00:17<00:13,  1.08it/s]\n 61%|██████    | 22/36 [00:18<00:12,  1.08it/s]\n 64%|██████▍   | 23/36 [00:19<00:11,  1.09it/s]\n 67%|██████▋   | 24/36 [00:20<00:11,  1.08it/s]\n 69%|██████▉   | 25/36 [00:21<00:10,  1.08it/s]\n 72%|███████▏  | 26/36 [00:22<00:09,  1.08it/s]\n 75%|███████▌  | 27/36 [00:23<00:08,  1.08it/s]\n 78%|███████▊  | 28/36 [00:24<00:07,  1.08it/s]\n 81%|████████  | 29/36 [00:25<00:06,  1.08it/s]\n 83%|████████▎ | 30/36 [00:26<00:05,  1.08it/s]\n 86%|████████▌ | 31/36 [00:26<00:04,  1.08it/s]\n 89%|████████▉ | 32/36 [00:27<00:03,  1.08it/s]\n 92%|█████████▏| 33/36 [00:28<00:02,  1.08it/s]\n 94%|█████████▍| 34/36 [00:29<00:01,  1.08it/s]\n 97%|█████████▋| 35/36 [00:30<00:00,  1.08it/s]\n100%|██████████| 36/36 [00:31<00:00,  1.08it/s]\n100%|██████████| 36/36 [00:31<00:00,  1.14it/s]",
  "metrics": {
    "predict_time": 44.949506,
    "total_time": 452.850658
  },
  "output": "https://replicate.delivery/pbxt/HuWYFtJyyH50BxruGu1XfUleB3kC2NfbTy2fmHbeEwKS6BsGC/audio.wav",
  "started_at": "2023-04-25T22:18:14.676132Z",
  "status": "succeeded",
  "urls": {
    "get": "https://api.replicate.com/v1/predictions/ngk3yp5omvdsvkcdoljxs2m4ra",
    "cancel": "https://api.replicate.com/v1/predictions/ngk3yp5omvdsvkcdoljxs2m4ra/cancel"
  },
  "version": "f23937d7c80b3c0f06c5a01ec55154388647292cb9398bd7d117678bc930791a"
}

Generated in

45.0 seconds

Tweak it Report View full prediction

0%|          | 0/100 [00:00<?, ?it/s]
  1%|          | 1/100 [00:00<00:27,  3.65it/s]
  3%|▎         | 3/100 [00:00<00:13,  7.12it/s]
  5%|▌         | 5/100 [00:00<00:10,  8.90it/s]
  7%|▋         | 7/100 [00:00<00:09,  9.98it/s]
  9%|▉         | 9/100 [00:00<00:08, 10.54it/s]
 11%|█         | 11/100 [00:01<00:08, 11.09it/s]
 13%|█▎        | 13/100 [00:01<00:07, 11.16it/s]
 15%|█▌        | 15/100 [00:01<00:07, 11.32it/s]
 17%|█▋        | 17/100 [00:01<00:07, 11.66it/s]
 19%|█▉        | 19/100 [00:01<00:06, 11.69it/s]
 21%|██        | 21/100 [00:01<00:06, 12.05it/s]
 23%|██▎       | 23/100 [00:02<00:06, 12.17it/s]
 25%|██▌       | 25/100 [00:02<00:06, 11.89it/s]
 27%|██▋       | 27/100 [00:02<00:06, 11.64it/s]
 29%|██▉       | 29/100 [00:02<00:06, 11.51it/s]
 31%|███       | 31/100 [00:02<00:05, 11.61it/s]
 33%|███▎      | 33/100 [00:02<00:05, 11.88it/s]
 35%|███▌      | 35/100 [00:03<00:05, 11.82it/s]
 37%|███▋      | 37/100 [00:03<00:05, 11.75it/s]
 39%|███▉      | 39/100 [00:03<00:05, 11.63it/s]
 41%|████      | 41/100 [00:03<00:05, 11.64it/s]
 43%|████▎     | 43/100 [00:03<00:04, 11.79it/s]
 45%|████▌     | 45/100 [00:04<00:04, 11.94it/s]
 47%|████▋     | 47/100 [00:04<00:04, 11.83it/s]
 49%|████▉     | 49/100 [00:04<00:04, 11.89it/s]
 51%|█████     | 51/100 [00:04<00:04, 11.70it/s]
 53%|█████▎    | 53/100 [00:04<00:04, 11.56it/s]
 55%|█████▌    | 55/100 [00:04<00:03, 11.64it/s]
 57%|█████▋    | 57/100 [00:05<00:03, 11.63it/s]
 59%|█████▉    | 59/100 [00:05<00:03, 11.38it/s]
 61%|██████    | 61/100 [00:05<00:03, 11.38it/s]
 63%|██████▎   | 63/100 [00:05<00:03, 11.07it/s]
 65%|██████▌   | 65/100 [00:05<00:03, 11.19it/s]
 67%|██████▋   | 67/100 [00:05<00:02, 11.28it/s]
 69%|██████▉   | 69/100 [00:06<00:02, 11.08it/s]
 71%|███████   | 71/100 [00:06<00:02, 11.11it/s]
 73%|███████▎  | 73/100 [00:06<00:02, 10.91it/s]
 75%|███████▌  | 75/100 [00:06<00:02, 10.78it/s]
 77%|███████▋  | 77/100 [00:06<00:02, 10.83it/s]
 79%|███████▉  | 79/100 [00:07<00:01, 10.86it/s]
 81%|████████  | 81/100 [00:07<00:01, 10.64it/s]
 83%|████████▎ | 83/100 [00:07<00:01, 10.67it/s]
 85%|████████▌ | 85/100 [00:07<00:01, 10.57it/s]
 87%|████████▋ | 87/100 [00:07<00:01, 10.34it/s]
 89%|████████▉ | 89/100 [00:08<00:01, 10.39it/s]
 91%|█████████ | 91/100 [00:08<00:00, 10.10it/s]
 93%|█████████▎| 93/100 [00:08<00:00, 10.06it/s]
100%|██████████| 100/100 [00:08<00:00, 20.32it/s]
100%|██████████| 100/100 [00:08<00:00, 11.69it/s]
  0%|          | 0/36 [00:00<?, ?it/s]
  3%|▎         | 1/36 [00:00<00:23,  1.48it/s]
  6%|▌         | 2/36 [00:01<00:22,  1.48it/s]
  8%|▊         | 3/36 [00:02<00:22,  1.45it/s]
 11%|█         | 4/36 [00:02<00:22,  1.43it/s]
 14%|█▍        | 5/36 [00:03<00:21,  1.41it/s]
 17%|█▋        | 6/36 [00:04<00:22,  1.36it/s]
 19%|█▉        | 7/36 [00:05<00:21,  1.34it/s]
 22%|██▏       | 8/36 [00:05<00:21,  1.30it/s]
 25%|██▌       | 9/36 [00:06<00:21,  1.25it/s]
 28%|██▊       | 10/36 [00:07<00:21,  1.23it/s]
 31%|███       | 11/36 [00:08<00:21,  1.19it/s]
 33%|███▎      | 12/36 [00:09<00:20,  1.15it/s]
 36%|███▌      | 13/36 [00:10<00:20,  1.13it/s]
 39%|███▉      | 14/36 [00:11<00:19,  1.12it/s]
 42%|████▏     | 15/36 [00:12<00:18,  1.11it/s]
 44%|████▍     | 16/36 [00:13<00:18,  1.10it/s]
 47%|████▋     | 17/36 [00:14<00:17,  1.10it/s]
 50%|█████     | 18/36 [00:14<00:16,  1.10it/s]
 53%|█████▎    | 19/36 [00:15<00:15,  1.09it/s]
 56%|█████▌    | 20/36 [00:16<00:14,  1.08it/s]
 58%|█████▊    | 21/36 [00:17<00:13,  1.08it/s]
 61%|██████    | 22/36 [00:18<00:12,  1.08it/s]
 64%|██████▍   | 23/36 [00:19<00:11,  1.09it/s]
 67%|██████▋   | 24/36 [00:20<00:11,  1.08it/s]
 69%|██████▉   | 25/36 [00:21<00:10,  1.08it/s]
 72%|███████▏  | 26/36 [00:22<00:09,  1.08it/s]
 75%|███████▌  | 27/36 [00:23<00:08,  1.08it/s]
 78%|███████▊  | 28/36 [00:24<00:07,  1.08it/s]
 81%|████████  | 29/36 [00:25<00:06,  1.08it/s]
 83%|████████▎ | 30/36 [00:26<00:05,  1.08it/s]
 86%|████████▌ | 31/36 [00:26<00:04,  1.08it/s]
 89%|████████▉ | 32/36 [00:27<00:03,  1.08it/s]
 92%|█████████▏| 33/36 [00:28<00:02,  1.08it/s]
 94%|█████████▍| 34/36 [00:29<00:01,  1.08it/s]
 97%|█████████▋| 35/36 [00:30<00:00,  1.08it/s]
100%|██████████| 36/36 [00:31<00:00,  1.08it/s]
100%|██████████| 36/36 [00:31<00:00,  1.14it/s]

This output was created using a different version of the model, suno-ai/bark:f23937d7.

Run time and cost

This model costs approximately $0.075 to run on Replicate, or 13 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia T4 GPU hardware. Predictions typically complete within 6 minutes. The predict time for this model varies significantly based on the inputs.

Readme

🐶 Bark

Original repo: https://github.com/suno-ai/bark

Bark is a transformer-based text-to-audio model created by Suno. Bark can generate highly realistic, multilingual speech as well as other audio - including music, background noise and simple sound effects. The model can also produce nonverbal communications like laughing, sighing and crying. To support the research community, we are providing access to pretrained model checkpoints ready for inference.

🙏 Appreciation

nanoGPT for a dead-simple and blazing fast implementation of GPT-style models
EnCodec for a state-of-the-art implementation of a fantastic audio codec
AudioLM for very related training and inference code
Vall-E, AudioLM and many other ground-breaking papers that enabled the development of Bark

© License

Bark is licensed under the MIT License.

Please contact us at bark@suno.ai to request access to a larger version of the model.