prunaai/dia-1.6b – Run with an API on Replicate

prunaai / dia-1.6b

Cold

Public
1.7K runs
A100 (80GB)

Iterate in playground

Run with an API

Playground API Examples README Versions

Input

text

*string

Shift + Return to add a new line

[S1] It's on Replicate!!! Oh fire! Oh my goodness! What's the procedure? What do we do people? The Dia text-to-speech model — now Pruna-optimized — just dropped on Replicate!!

[S2] Oh my god! Okay… it's happening. Everybody stay calm!

[S1] What's the procedure…

[S2] Everybody stay fricking calm!!!... Everybody fudging calm down!!!!!

[S1] Yes! Yes! Let's try it out at prunaai/dia-1.6b (laughs) — powered up and made leaner with Pruna!

[S2] (whispers) try it now… (whispers) turbocharged by Pruna…[S1] It's on Replicate!!! Oh fire! Oh my goodness! What's the procedure? What do we do people? The Dia text-to-speech model — now Pruna-optimized — just dropped on Replicate!!

[S2] Oh my god! Okay… it's happening. Everybody stay calm!

[S1] What's the procedure…

[S2] Everybody stay fricking calm!!!... Everybody fudging calm down!!!!!

[S1] Yes! Yes! Let's try it out at prunaai/dia-1.6b (laughs) — powered up and made leaner with Pruna!

[S2] (whispers) try it now… (whispers) turbocharged by Pruna…

Input text for dialogue generation. Use [S1], [S2] to indicate different speakers and (description) in parentheses for non-verbal cues e.g., (laughs), (whispers).

max_new_tokens

integer

(minimum: 500, maximum: 4096)

Controls the length of generated audio. Higher values create longer audio. (86 tokens ≈ 1 second of audio).

Default: 3072

cfg_scale

number

(minimum: 1, maximum: 5)

Controls how closely the audio follows your text. Higher values (3-5) follow text more strictly; lower values may sound more natural but deviate more.

Default: 3

temperature

number

(minimum: 0.1, maximum: 2)

Controls randomness in generation. Higher values (1.3-2.0) increase variety; lower values (0.1-0.9) make output more consistent and predictable.

Default: 1.3

top_p

number

(minimum: 0.1, maximum: 1)

Controls diversity of word choice. Higher values include more unusual options. Most users shouldn't need to adjust this parameter.

Default: 0.95

cfg_filter_top_k

integer

(minimum: 10, maximum: 100)

Technical parameter for filtering audio generation tokens. Higher values allow more diverse sounds; lower values create more consistent audio.

Default: 35

speed_factor

number

(minimum: 0.5, maximum: 1.5)

Adjusts playback speed of the generated audio. Values below 1.0 slow down the audio; 1.0 is original speed.

Default: 0.94

seed

integer

Random seed for reproducible results. Use the same seed value to get the same output for identical inputs. Leave blank for random results each time.

Default: -1

Run this model in Node.js with one line of code:

npx create-replicate --model=prunaai/dia-1.6b

or set up a project from scratch

Install Replicate’s Node.js client library:

npm install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import and set up the client:

import Replicate from "replicate";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run prunaai/dia-1.6b using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "prunaai/dia-1.6b:5e364f84db4cd1916990138229e14179036a1bfcf20a39c4b0f3214e33a5c48a",
  {
    input: {
      seed: -1,
      text: "[S1] It's on Replicate!!! Oh fire! Oh my goodness! What's the procedure? What do we do people? The Dia text-to-speech model — now Pruna-optimized — just dropped on Replicate!!\n\n[S2] Oh my god! Okay… it's happening. Everybody stay calm!\n\n[S1] What's the procedure…\n\n[S2] Everybody stay fricking calm!!!... Everybody fudging calm down!!!!!\n\n[S1] Yes! Yes! Let's try it out at prunaai/dia-1.6b (laughs) — powered up and made leaner with Pruna!\n\n[S2] (whispers) try it now… (whispers) turbocharged by Pruna…",
      top_p: 0.95,
      cfg_scale: 3,
      temperature: 1.3,
      speed_factor: 0.94,
      max_new_tokens: 3072,
      cfg_filter_top_k: 35
    }
  }
);

// To access the file URL:
console.log(output.url()); //=> "http://example.com"

// To write the file to disk:
fs.writeFile("my-image.png", output);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import the client:

import replicate

Run prunaai/dia-1.6b using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "prunaai/dia-1.6b:5e364f84db4cd1916990138229e14179036a1bfcf20a39c4b0f3214e33a5c48a",
    input={
        "seed": -1,
        "text": "[S1] It's on Replicate!!! Oh fire! Oh my goodness! What's the procedure? What do we do people? The Dia text-to-speech model — now Pruna-optimized — just dropped on Replicate!!\n\n[S2] Oh my god! Okay… it's happening. Everybody stay calm!\n\n[S1] What's the procedure…\n\n[S2] Everybody stay fricking calm!!!... Everybody fudging calm down!!!!!\n\n[S1] Yes! Yes! Let's try it out at prunaai/dia-1.6b (laughs) — powered up and made leaner with Pruna!\n\n[S2] (whispers) try it now… (whispers) turbocharged by Pruna…",
        "top_p": 0.95,
        "cfg_scale": 3,
        "temperature": 1.3,
        "speed_factor": 0.94,
        "max_new_tokens": 3072,
        "cfg_filter_top_k": 35
    }
)
print(output)

To learn more, take a look at the guide on getting started with Python.

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Run prunaai/dia-1.6b using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "prunaai/dia-1.6b:5e364f84db4cd1916990138229e14179036a1bfcf20a39c4b0f3214e33a5c48a",
    "input": {
      "seed": -1,
      "text": "[S1] It\'s on Replicate!!! Oh fire! Oh my goodness! What\'s the procedure? What do we do people? The Dia text-to-speech model — now Pruna-optimized — just dropped on Replicate!!\\n\\n[S2] Oh my god! Okay… it\'s happening. Everybody stay calm!\\n\\n[S1] What\'s the procedure…\\n\\n[S2] Everybody stay fricking calm!!!... Everybody fudging calm down!!!!!\\n\\n[S1] Yes! Yes! Let\'s try it out at prunaai/dia-1.6b (laughs) — powered up and made leaner with Pruna!\\n\\n[S2] (whispers) try it now… (whispers) turbocharged by Pruna…",
      "top_p": 0.95,
      "cfg_scale": 3,
      "temperature": 1.3,
      "speed_factor": 0.94,
      "max_new_tokens": 3072,
      "cfg_filter_top_k": 35
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

Output

Video Player is loading.

Current Time 00:00:000

Duration 00:00:000

Loaded: 0%

Stream Type LIVE

Remaining Time 00:00:000

Generated in

20.5 seconds

Tweak it ShareReport

Run time and cost

This model costs approximately $0.040 to run on Replicate, or 25 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia A100 (80GB) GPU hardware. Predictions typically complete within 29 seconds. The predict time for this model varies significantly based on the inputs.

Readme

This model doesn't have a readme.