Semantic scene segmentation has become an important application in computer vision and is an essential part of intelligent transportation systems for complete scene understanding of the surrounding environment. Several methods based on convolutional neural networks have emerged, but they have some problems, including small-scale target loss, inaccurate detailed region segmentation, and boundary category confusion. Using shallow features, we exploit the capabilities of global context information according to the theory of pyramids. A weighted pyramid feature fusion module is constructed to fuse the feature maps of different scales generated by the backbone network, and the proportion of feature fusion is dynamically updated by trainable parameters. After that, a self-attention mechanism is introduced to discover information about spatial channel interdependencies. Finally, the atrous spatial pyramid pooling module of the DeepLabv3+ network is improved by connecting the atrous convolution with different dilation rates at the receptive field. The experimental results show 4.1% mean pixel accuracy and 3.92% mean intersection over union improvements in the proposed method compared with the DeepLabv3+, and the result of semantic segmentation is more accurate. |
ACCESS THE FULL ARTICLE
No SPIE Account? Create one
CITATIONS
Cited by 3 scholarly publications.
Image segmentation
Convolution
Visualization
Image fusion
Intelligence systems
Computer programming
Feature extraction