yuguochencuc / db-aiat

Dual-branch Attention-In-Attention Transformer for single-channel speech enhancement

  • Public
  • 358 runs
  • GitHub
  • Paper
  • License

😵 Uh oh! This model can't be run on Replicate because it was built with a version of Cog that is no longer supported. Consider opening an issue on the model's GitHub repository to see if it can be updated to use a recent version of Cog. If you need any help, please hop into our Discord channel or Contact us about it.

Run time and cost

This model runs on Nvidia T4 GPU hardware.

Readme

DB-AIAT: A Dual-branch attention-in-attention transformer for single-channel SE

Abstract

Curriculum learning begins to thrive in the speech enhancement area, which decouples the original spectrum estimation task into multiple easier sub-tasks to achieve better performance. Motivated by that, we propose a dual-branch attention-in-attention transformer-based module dubbed DB-AIAT to handle both coarse- and fine-grained regions of spectrum in parallel. From a complementary perspective, a magnitude masking branch is proposed to estimate the overall spectral magnitude, while a complex refining branch is designed to compensate for the missing complex spectral details and implicitly derive phase information. Within each branch, we propose a novel attention-in-attention transformer-based module to replace the conventional RNNs and temporal convolutional network for temporal sequence modeling. Specifically, the proposed attention-in-attention transformer consists of adaptive temporal-frequency attention transformer blocks and an adaptive hierarchical attention module, which can capture long-term time-frequency dependencies and further aggregate global hierarchical contextual information. The experimental results on VoiceBank + Demand dataset show that DB-AIAT yields state-of-the-art performance (e.g., 3.31 PESQ, 95.6% STOI and 10.79dB SSNR) over previous advanced systems with a relatively light model size (2.81M).

Comparison with SOTA

sota-comparison

Citation

If you use our code in your research or wish to refer to the baseline results, please use the following BibTeX entry.

@article{yu2021dual,
  title={Dual-branch Attention-In-Attention Transformer for single-channel speech enhancement},
  author={Yu, Guochen and Li, Andong and Wang, Yutian and Guo, Yinuo and Wang, Hui and Zheng, Chengshi},
  journal={arXiv preprint arXiv:2110.06467},
  year={2021}
}