Input

Video Player is loading.

Current Time 00:00:000

Duration 00:00:000

Loaded: 0%

Stream Type LIVE

Remaining Time 00:00:000

audio

*file

Input audio

model_name

string

Choose a model

Default: "htdemucs"

stem

string

Only separate audio into the chosen stem and others (no_stem).

clip_mode

string

Strategy for avoiding clipping: rescaling entire signal if necessary (rescale) or hard clipping (clamp).

Default: "rescale"

shifts

integer

Number of random shifts for equivariant stabilization.Increase separation time but improves quality for Demucs. 10 was used in the original paper

Default: 1

overlap

number

Overlap between the splits.

Default: 0.25

mp3_bitrate

integer

Bitrate of converted mp3

Default: 320

float32

boolean

Save wav output as float32 (2x bigger).

Default: false

output_format

string

Choose the output format

Default: "mp3"

Run this model in Node.js with one line of code:

npx create-replicate --model=cjwbw/demucs

or set up a project from scratch

Install Replicate’s Node.js client library:

npm install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import and set up the client:

import Replicate from "replicate";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run cjwbw/demucs using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "cjwbw/demucs:25a173108cff36ef9f80f854c162d01df9e6528be175794b81158fa03836d953",
  {
    input: {
      stem: "vocals",
      audio: "https://replicate.delivery/pbxt/J6Quo9VPU210JJB9HS97ThWUxT7iax8PWiP7FD5f3bg2G6AY/test1.mp3",
      shifts: 1,
      float32: false,
      overlap: 0.25,
      clip_mode: "rescale",
      model_name: "htdemucs",
      mp3_bitrate: 320,
      output_format: "mp3"
    }
  }
);

console.log(output);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import the client:

import replicate

Run cjwbw/demucs using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "cjwbw/demucs:25a173108cff36ef9f80f854c162d01df9e6528be175794b81158fa03836d953",
    input={
        "stem": "vocals",
        "audio": "https://replicate.delivery/pbxt/J6Quo9VPU210JJB9HS97ThWUxT7iax8PWiP7FD5f3bg2G6AY/test1.mp3",
        "shifts": 1,
        "float32": False,
        "overlap": 0.25,
        "clip_mode": "rescale",
        "model_name": "htdemucs",
        "mp3_bitrate": 320,
        "output_format": "mp3"
    }
)

print(output)

To learn more, take a look at the guide on getting started with Python.

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Run cjwbw/demucs using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "cjwbw/demucs:25a173108cff36ef9f80f854c162d01df9e6528be175794b81158fa03836d953",
    "input": {
      "stem": "vocals",
      "audio": "https://replicate.delivery/pbxt/J6Quo9VPU210JJB9HS97ThWUxT7iax8PWiP7FD5f3bg2G6AY/test1.mp3",
      "shifts": 1,
      "float32": false,
      "overlap": 0.25,
      "clip_mode": "rescale",
      "model_name": "htdemucs",
      "mp3_bitrate": 320,
      "output_format": "mp3"
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

Output

bass

Video Player is loading.

Current Time 00:00:000

Duration 00:00:000

Loaded: 0%

Stream Type LIVE

Remaining Time 00:00:000

drums

Video Player is loading.

Current Time 00:00:000

Duration 00:00:000

Loaded: 0%

Stream Type LIVE

Remaining Time 00:00:000

other

Video Player is loading.

Current Time 00:00:000

Duration 00:00:000

Loaded: 0%

Stream Type LIVE

Remaining Time 00:00:000

vocals

Video Player is loading.

Current Time 00:00:000

Duration 00:00:000

Loaded: 0%

Stream Type LIVE

Remaining Time 00:00:000

{
  "completed_at": "2023-07-02T13:16:03.793953Z",
  "created_at": "2023-07-02T13:09:45.628835Z",
  "data_removed": false,
  "error": null,
  "id": "dc65harbsxzisn5qzyc5ekhfrm",
  "input": {
    "audio": "https://replicate.delivery/pbxt/J6Quo9VPU210JJB9HS97ThWUxT7iax8PWiP7FD5f3bg2G6AY/test1.mp3",
    "shifts": 1,
    "overlap": 0.25,
    "clip_mode": "rescale",
    "model_name": "htdemucs",
    "mp3_bitrate": 320,
    "output_format": "mp3"
  },
  "logs": "0%|                                                                                   | 0.0/11.7 [00:00<?, ?seconds/s]\n 50%|█████████████████████████████████████                                     | 5.85/11.7 [00:01<00:01,  3.19seconds/s]\n100%|██████████████████████████████████████████████████████████████████████████| 11.7/11.7 [00:02<00:00,  6.59seconds/s]\n100%|██████████████████████████████████████████████████████████████████████████| 11.7/11.7 [00:02<00:00,  5.68seconds/s]",
  "metrics": {
    "predict_time": 14.642236,
    "total_time": 378.165118
  },
  "output": {
    "bass": "https://replicate.delivery/pbxt/xS2oNA7iL0rzLpKVzafqakkr1fT6p2RdgfWz8hJzpE3jUlXiA/bass.mp3",
    "drums": "https://replicate.delivery/pbxt/OZduILkg6lYgEd2Dq02z4u0GlZWZxTCjipGp2VBAssokq8SE/drums.mp3",
    "other": "https://replicate.delivery/pbxt/aoDOOSdliPIzPd7fqCM0MXRH1anPeJp14NcqmUPpCyGTqyLRA/other.mp3",
    "piano": null,
    "guitar": null,
    "vocals": "https://replicate.delivery/pbxt/QmkyLa6ikf0AfUObCIO1M6hEaYVoIekVZdZiwLMRu6aiUlXiA/vocals.mp3"
  },
  "started_at": "2023-07-02T13:15:49.151717Z",
  "status": "succeeded",
  "urls": {
    "get": "https://api.replicate.com/v1/predictions/dc65harbsxzisn5qzyc5ekhfrm",
    "cancel": "https://api.replicate.com/v1/predictions/dc65harbsxzisn5qzyc5ekhfrm/cancel"
  },
  "version": "abf8fe28e407afa6d8e41e86a759caccc0af8e49c3c68016006b62cb0968441e"
}

Generated in

14.7 seconds

Tweak itReport View full prediction

This output was created using a different version of the model, cjwbw/demucs:abf8fe28.

Run time and cost

This model costs approximately $0.023 to run on Replicate, or 43 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia T4 GPU hardware. Predictions typically complete within 101 seconds. The predict time for this model varies significantly based on the inputs.

Readme

Demucs Music Source Separation

Demucs is a state-of-the-art music source separation model, currently capable of separating drums, bass, and vocals from the rest of the accompaniment. Demucs is based on a U-Net convolutional architecture inspired by [Wave-U-Net][waveunet]. The v4 version features [Hybrid Transformer Demucs][htdemucs], a hybrid spectrogram/waveform separation model using Transformers. It is based on [Hybrid Demucs][hybrid_paper] (also provided in this repo) with the innermost layers are replaced by a cross-domain Transformer Encoder. This Transformer uses self-attention within each domain, and cross-attention across domains. The model achieves a SDR of 9.00 dB on the MUSDB HQ test set. Moreover, when using sparse attention kernels to extend its receptive field and per source fine-tuning, we achieve state-of-the-art 9.20 dB of SDR.

Samples are available on our sample page. Checkout [our paper][htdemucs] for more information. It has been trained on the [MUSDB HQ][musdb] dataset + an extra training dataset of 800 songs. This model separates drums, bass and vocals and other stems for any song.

As Hybrid Transformer Demucs is brand new, it is not activated by default, you can activate it in the usual commands described hereafter with -n htdemucs_ft. The single, non fine-tuned model is provided as -n htdemucs, and the retrained baseline as -n hdemucs_mmi. The Sparse Hybrid Transformer model decribed in our paper is not provided as its requires custom CUDA code that is not ready for release yet. We are also releasing an experimental 6 sources model, that adds a guitar and piano source. Quick testing seems to show okay quality for guitar, but a lot of bleeding and artifacts for the piano source.

The list of pre-trained models is:
- htdemucs: first version of Hybrid Transformer Demucs. Trained on MusDB + 800 songs. Default model.
- htdemucs_ft: fine-tuned version of htdemucs, separation will take 4 times more time but might be a bit better. Same training set as htdemucs.
- htdemucs_6s: 6 sources version of htdemucs, with piano and guitar being added as sources. Note that the piano source is not working great at the moment.
- hdemucs_mmi: Hybrid Demucs v3, retrained on MusDB + 800 songs.
- mdx: trained only on MusDB HQ, winning model on track A at the [MDX][mdx] challenge.
- mdx_extra: trained with extra training data (including MusDB test set), ranked 2nd on the track B of the [MDX][mdx] challenge.
- mdx_q, mdx_extra_q: quantized version of the previous models. Smaller download and storage but quality can be slightly worse.

Schema representing the structure of Hybrid Transformer Demucs,
with a dual U-Net structure, one branch for the temporal domain,
and one branch for the spectral domain. There is a cross-domain Transformer between the Encoders and Decoders.

How to cite

@inproceedings{rouard2022hybrid,
  title={Hybrid Transformers for Music Source Separation},
  author={Rouard, Simon and Massa, Francisco and D{\'e}fossez, Alexandre},
  booktitle={ICASSP 23},
  year={2023}
}

@inproceedings{defossez2021hybrid,
  title={Hybrid Spectrogram and Waveform Source Separation},
  author={D{\'e}fossez, Alexandre},
  booktitle={Proceedings of the ISMIR 2021 Workshop on Music Source Separation},
  year={2021}
}

cjwbw / demucs

Input

Output

Run time and cost

Readme

Demucs Music Source Separation

How to cite

Logs (dc65harbsxzisn5qzyc5ekhfrm)