19 November 2024 Salient object detection based on Pyramid Vision Transformer–gated network
Xiaoli Zhou, Lina Huo, Wei Wang, Peng Hao
Author Affiliations +
Abstract

Recently, vision transformers started to show impressive results that significantly outperform large convolution-based models. We propose a gating network for salient object detection. The Pyramid Vision Transformer serves as the backbone network of this gating network, learning global and local representations with its self-attention mechanism. Multi-level gating units are used to recover more details of the saliency map by establishing cooperation among different levels of features, thus improving the discriminability of the whole network. With the help of multi-level gating units, the valuable context information from the encoder can be optimally transmitted to the decoder. The pyramid pooling module collects high-level semantic information. Moreover, the semantic information of each level is integrated and decoded by the feature aggregation decoder. The experimental results on five challenging benchmark databases demonstrate that the proposed method achieves more favorable performance than the current state-of-the-art methods in terms of four evaluation criteria.

© 2024 SPIE and IS&T
Xiaoli Zhou, Lina Huo, Wei Wang, and Peng Hao "Salient object detection based on Pyramid Vision Transformer–gated network," Journal of Electronic Imaging 33(6), 063021 (19 November 2024). https://doi.org/10.1117/1.JEI.33.6.063021
Received: 6 June 2024; Accepted: 30 October 2024; Published: 19 November 2024
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
Back to Top