retrocirce/zero_shot_audio_source_separation | Run with an API on Replicate

Run time and cost

This model costs approximately $0.0054 to run on Replicate, or 185 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia T4 GPU hardware. Predictions typically complete within 25 seconds. The predict time for this model varies significantly based on the inputs.

Readme

A demo for the official github repository for the paper Zero-shot Audio Source Separation through Query-based Learning from Weakly-labeled Data”, in AAAI 2022. short instroduction video full presentation video. Authors website

This model allows you to separate any source from a sound track. For example if you have a jazz song with a clarinet track in it you can extract the clarient showing the model a clarinet sound sample.

The inputs are a mixture audio to separate, and a given source sample as a query. The output will be the extracted source track from the mixture.

Citing

@inproceedings{zsasp-ke2022,
  author = {Ke Chen* and Xingjian Du* and Bilei Zhu and Zejun Ma and Taylor Berg-Kirkpatrick and Shlomo Dubnov},
  title = {Zero-shot Audio Source Separation via Query-based Learning from Weakly-labeled Data},
  booktitle = {{AAAI} 2022}
}

@inproceedings{htsat-ke2022,
  author = {Ke Chen and Xingjian Du and Bilei Zhu and Zejun Ma and Taylor Berg-Kirkpatrick and Shlomo Dubnov},
  title = {HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection},
  booktitle = {{ICASSP} 2022}
}