Deep learning methods have demonstrated excellent performance in semantic segmentation tasks across various scenarios. However, their performance remains inadequate for real-time applications, such as road scenes, prompting extensive research into real-time semantic segmentation. We proposed a network with a two-branch architecture that is a prevalent framework in this field. We first create an efficient backbone based on the U-like residual block we proposed to enhance multi-scale feature extraction capabilities. In addition, many two-branch networks have designed complex structures for fusing the feature maps of the two branches at the end of the networks. To simplify this process, our network designs a lightweight fusion mechanism that utilizes attention calculations to guide the fusing of features. We name this module as dual-guided attention. This fusion module operates in parallel, with each branch employing a fully connected layer with few neurons to determine the correlation of the flattened feature map, thereby executing the attention and feature fusion simultaneously. Extensive experiments on the Cityscapes and Cambridge-driving Labeled Video datasets show the effectiveness of our method. |
ACCESS THE FULL ARTICLE
No SPIE Account? Create one