retrocirce / zero_shot_audio_source_separation

Zero shot Sound separation by arbitrary query samples

  • Public
  • 27.7K runs
  • GitHub
  • Paper
  • License

Input

Output

Run time and cost

This model runs on Nvidia T4 GPU hardware. Predictions typically complete within 6 minutes. The predict time for this model varies significantly based on the inputs.

Readme

A demo for the official github repository for the paper Zero-shot Audio Source Separation through Query-based Learning from Weakly-labeled Data”, in AAAI 2022. short instroduction video full presentation video. Authors website

This model allows you to separate any source from a sound track. For example if you have a jazz song with a clarinet track in it you can extract the clarient showing the model a clarinet sound sample.

The inputs are a mixture audio to separate, and a given source sample as a query. The output will be the extracted source track from the mixture.

Citing

@inproceedings{zsasp-ke2022,
  author = {Ke Chen* and Xingjian Du* and Bilei Zhu and Zejun Ma and Taylor Berg-Kirkpatrick and Shlomo Dubnov},
  title = {Zero-shot Audio Source Separation via Query-based Learning from Weakly-labeled Data},
  booktitle = {{AAAI} 2022}
}

@inproceedings{htsat-ke2022,
  author = {Ke Chen and Xingjian Du and Bilei Zhu and Zejun Ma and Taylor Berg-Kirkpatrick and Shlomo Dubnov},
  title = {HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection},
  booktitle = {{ICASSP} 2022}
}