27 June 2024 Adaptive sparse attention module based on reciprocal nearest neighbors
Zhonggui Sun, Can Zhang, Mingzhu Zhang
Author Affiliations +
Abstract

The attention mechanism has become a crucial technique in deep feature representation for computer vision tasks. Using a similarity matrix, it enhances the current feature point with global context from the feature map of the network. However, the indiscriminate utilization of all information can easily introduce some irrelevant contents, inevitably hampering performance. In response to this challenge, sparsing, a common information filtering strategy, has been applied in many related studies. Regrettably, their filtering processes often lack reliability and adaptability. To address this issue, we first define an adaptive-reciprocal nearest neighbors (A-RNN) relationship. In identifying neighbors, it gains flexibility through learning adaptive thresholds. In addition, by introducing a reciprocity mechanism, the reliability of neighbors is ensured. Then, we use A-RNN to rectify the similarity matrix in the conventional attention module. In the specific implementation, to distinctly consider non-local and local information, we introduce two blocks: the non-local sparse constraint block and the local sparse constraint block. The former utilizes A-RNN to sparsify non-local information, whereas the latter uses adaptive thresholds to sparsify local information. As a result, an adaptive sparse attention (ASA) module is achieved, inheriting the advantages of flexibility and reliability from A-RNN. In the validation for the proposed ASA module, we use it to replace the attention module in NLNet and conduct experiments on semantic segmentation benchmarks including Cityscapes, ADE20K and PASCAL VOC 2012. With the same backbone (ResNet101), our ASA module outperforms the conventional attention module and its some state-of-the-art variants.

© 2024 SPIE and IS&T
Zhonggui Sun, Can Zhang, and Mingzhu Zhang "Adaptive sparse attention module based on reciprocal nearest neighbors," Journal of Electronic Imaging 33(3), 033038 (27 June 2024). https://doi.org/10.1117/1.JEI.33.3.033038
Received: 8 January 2024; Accepted: 5 June 2024; Published: 27 June 2024
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Matrices

Reliability

Image segmentation

Semantics

Computer vision technology

Tunable filters

Convolution

Back to Top