Run time and cost

This model costs approximately $0.095 to run on Replicate, or 10 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia A100 (80GB) GPU hardware. Predictions typically complete within 68 seconds. The predict time for this model varies significantly based on the inputs.

Readme

Usage

Choose any or all of the instruments in a track to isolate and remove the instrument from the track. Outputs the isolated instrument track and the rest of the audio in a merged track.

Expected runtime (after startup): 1 minute.

Algorithm

Demucs is a state-of-the-art music source separation model, currently capable of separating drums, bass, piano, guitar, and vocals from the rest of the accompaniment. Demucs is based on a U-Net convolutional architecture inspired by Wave-U-Net. The v4 version features Hybrid Transformer Demucs, a hybrid spectrogram/waveform separation model using Transformers. It is based on Hybrid Demucs (also provided in this repo), with the innermost layers replaced by a cross-domain Transformer Encoder. This Transformer uses self-attention within each domain, and cross-attention across domains. The model achieves a SDR of 9.00 dB on the MUSDB HQ test set.

Sample track

Song - Cobie Sample Artist - JBlanked Source - Free Music Archive License - CC BY-NC-ND