A novel AAD network, named MHANet, is proposed in this paper. This architecture combines multi-scale temporal features and spatial distribution features to capture long-short range spatiotemporal dependencies simultaneously. It achieves SOTA decoding accuracy within an extremely short 0.1-second decision window on the KUL dataset, with an accuracy of 95.6%. It outperforms the best model by 6.4%. Moreover, our model displays high efficiency, needing only 0.02 M training parameters, which is 3 times fewer than those required by the most advanced model.
Lu Li, Cunhang Fan, Hongyu Zhang, Jingjing Zhang, Xiaoke Yang, Jian Zhou, Zhao Lv. MHANet: Multi-scale Hybrid Attention Network for Auditory Attention Detection. In IJCAI 2025.

- Please download the AAD dataset for training.
- The public KUL dataset, DTU dataset and AVED dataset are used in this paper.
- Python3.12
pip install -r requirements.txt
- Modifying the Run Settings in
config.py - Using main.py to train and test the model