prunaai / dia-1.6b

  • Public
  • 1.7K runs
  • A100 (80GB)
Iterate in playground

Input

*string
Shift + Return to add a new line

Input text for dialogue generation. Use [S1], [S2] to indicate different speakers and (description) in parentheses for non-verbal cues e.g., (laughs), (whispers).

integer
(minimum: 500, maximum: 4096)

Controls the length of generated audio. Higher values create longer audio. (86 tokens ≈ 1 second of audio).

Default: 3072

number
(minimum: 1, maximum: 5)

Controls how closely the audio follows your text. Higher values (3-5) follow text more strictly; lower values may sound more natural but deviate more.

Default: 3

number
(minimum: 0.1, maximum: 2)

Controls randomness in generation. Higher values (1.3-2.0) increase variety; lower values (0.1-0.9) make output more consistent and predictable.

Default: 1.3

number
(minimum: 0.1, maximum: 1)

Controls diversity of word choice. Higher values include more unusual options. Most users shouldn't need to adjust this parameter.

Default: 0.95

integer
(minimum: 10, maximum: 100)

Technical parameter for filtering audio generation tokens. Higher values allow more diverse sounds; lower values create more consistent audio.

Default: 35

number
(minimum: 0.5, maximum: 1.5)

Adjusts playback speed of the generated audio. Values below 1.0 slow down the audio; 1.0 is original speed.

Default: 0.94

integer

Random seed for reproducible results. Use the same seed value to get the same output for identical inputs. Leave blank for random results each time.

Default: -1

Output

Video Player is loading.
Current Time 00:00:000
Duration 00:00:000
Loaded: 0%
Stream Type LIVE
Remaining Time 00:00:000
 
1x
Generated in

Run time and cost

This model costs approximately $0.040 to run on Replicate, or 25 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia A100 (80GB) GPU hardware. Predictions typically complete within 29 seconds. The predict time for this model varies significantly based on the inputs.

Readme

This model doesn't have a readme.