retrocirce / zero_shot_audio_source_separation

Zero shot Sound separation by arbitrary query samples

  • Public
  • 40.1K runs
  • GitHub
  • Paper
  • License

Input

Output

Run time and cost

This model costs approximately $0.015 to run on Replicate, or 66 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia T4 GPU hardware. Predictions typically complete within 68 seconds. The predict time for this model varies significantly based on the inputs.

Readme

A demo for the official github repository for the paper Zero-shot Audio Source Separation through Query-based Learning from Weakly-labeled Data”, in AAAI 2022. short instroduction video full presentation video. Authors website

This model allows you to separate any source from a sound track. For example if you have a jazz song with a clarinet track in it you can extract the clarient showing the model a clarinet sound sample.

The inputs are a mixture audio to separate, and a given source sample as a query. The output will be the extracted source track from the mixture.

Citing

@inproceedings{zsasp-ke2022,
  author = {Ke Chen* and Xingjian Du* and Bilei Zhu and Zejun Ma and Taylor Berg-Kirkpatrick and Shlomo Dubnov},
  title = {Zero-shot Audio Source Separation via Query-based Learning from Weakly-labeled Data},
  booktitle = {{AAAI} 2022}
}

@inproceedings{htsat-ke2022,
  author = {Ke Chen and Xingjian Du and Bilei Zhu and Zejun Ma and Taylor Berg-Kirkpatrick and Shlomo Dubnov},
  title = {HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection},
  booktitle = {{ICASSP} 2022}
}