Input

Run this model in Node.js with one line of code:

npx create-replicate --model=declare-lab/tango

or set up a project from scratch

Install Replicate’s Node.js client library:

npm install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import and set up the client:

import Replicate from "replicate";
import fs from "node:fs";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run declare-lab/tango using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "declare-lab/tango:740e4f5e59bd3b871c9e5b4efbff7ded516d40aa6abf4e95fd5e8dd149b7bc3f",
  {
    input: {
      model: "tango2",
      steps: 100,
      prompt: "Quiet speech and then and airplane flying away",
      guidance: 3
    }
  }
);

// To access the file URL:
console.log(output.url()); //=> "http://example.com"

// To write the file to disk:
fs.writeFile("my-image.png", output);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import the client:

import replicate

Run declare-lab/tango using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "declare-lab/tango:740e4f5e59bd3b871c9e5b4efbff7ded516d40aa6abf4e95fd5e8dd149b7bc3f",
    input={
        "model": "tango2",
        "steps": 100,
        "prompt": "Quiet speech and then and airplane flying away",
        "guidance": 3
    }
)
print(output)

To learn more, take a look at the guide on getting started with Python.

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Run declare-lab/tango using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "declare-lab/tango:740e4f5e59bd3b871c9e5b4efbff7ded516d40aa6abf4e95fd5e8dd149b7bc3f",
    "input": {
      "model": "tango2",
      "steps": 100,
      "prompt": "Quiet speech and then and airplane flying away",
      "guidance": 3
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

Output

Video Player is loading.

Current Time 00:00:000

Duration 00:00:000

Loaded: 0%

Stream Type LIVE

Remaining Time 00:00:000

Generated in

12.4 seconds

Tweak it ShareReport View full prediction

Run time and cost

This model costs approximately $0.081 to run on Replicate, or 12 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia L40S GPU hardware. Predictions typically complete within 83 seconds. The predict time for this model varies significantly based on the inputs.

Readme

TANGO: Text to Audio using iNstruction-Guided diffusiOn

🎵 🔥 🎉 🎉 We are releasing Tango 2 built upon Tango for text-to-audio generation. Tango 2 was initialized with the Tango-full-ft checkpoint and underwent alignment training using DPO on audio-alpaca, a pairwise text-to-audio preference dataset. Download the model, Access the demo. Trainer is available in the tango2 directory🎶

Description

TANGO is a latent diffusion model (LDM) for text-to-audio (TTA) generation. TANGO can generate realistic audios including human sounds, animal sounds, natural and artificial sounds and sound effects from textual prompts. We use the frozen instruction-tuned LLM Flan-T5 as the text encoder and train a UNet based diffusion model for audio generation. We perform comparably to current state-of-the-art models for TTA across both objective and subjective metrics, despite training the LDM on a 63 times smaller dataset. We release our model, training, inference code, and pre-trained checkpoints for the research community.

🎵 🔥 We are releasing Tango 2 built upon Tango for text-to-audio generation. Tango 2 was initialized with the Tango-full-ft checkpoint and underwent alignment training using DPO on audio-alpaca, a pairwise text-to-audio preference dataset. 🎶

🎵 🔥 We are also making Audio-alpaca available. Audio-alpaca is a pairwise preference dataset containing about 15k (prompt,audio_w, audio_l) triplets where given a textual prompt, audio_w is the preferred generated audio and audio_l is the undesirable audio. Download Audio-alpaca. Tango 2 was trained on Audio-alpaca.

Citation

Please consider citing the following articles if you found our work useful:

@misc{majumder2024tango,
      title={Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization}, 
      author={Navonil Majumder and Chia-Yu Hung and Deepanway Ghosal and Wei-Ning Hsu and Rada Mihalcea and Soujanya Poria},
      year={2024},
      eprint={2404.09956},
      archivePrefix={arXiv},
      primaryClass={cs.SD}
}
@article{ghosal2023tango,
  title={Text-to-Audio Generation using Instruction Tuned LLM and Latent Diffusion Model},
  author={Ghosal, Deepanway and Majumder, Navonil and Mehrish, Ambuj and Poria, Soujanya},
  journal={arXiv preprint arXiv:2304.13731},
  year={2023}
}

Acknowledgement

We borrow the code in audioldm and audioldm_eval from the AudioLDM repositories. We thank the AudioLDM team for open-sourcing their code.

declare-lab / tango

Input

Output

Run time and cost

Readme

TANGO: Text to Audio using iNstruction-Guided diffusiOn

Description

Citation

Acknowledgement

Logs (4gahgg8fa1rgm0cf6sg80rcgn4)