Semantic segmentation is a fundamental and important task in the field of remote-sensing images. However, most convolutional neural network (CNN)-based methods often produce undesirable results of blurry object boundaries as convolution operations may inherently fail to retain the edge information while expanding the receptive field. Besides, CNNs struggle to capture subtle relationships between distant parts of an image. This limitation can be particularly detrimental to semantic segmentation as it requires a comprehensive understanding of the global image context. Transformers, nevertheless, excel at processing long-range dependencies. To address the above issues, we propose a novel three-branch network with three new modules, called GDBNet. The Swin-Transformer-based global (G) branch is for capturing global context information. The ResNet-based detail (D) branch and boundary (B) branch for processing local features and edge information. The three modules are feature preservation module (FPM), spatial interaction module (SIM), and balance feature module (BFM). FPM can maximumly retain the context information during the downsampling stage of Swin Transformer in the global branch, whereas SIM and BFM can fuse the information from different branches in a boundary-oriented manner. Experiments show that GDBNet outperforms state-of-the-art methods quantitatively and qualitatively on Vaihingen, Potsdam, and WHDLD datasets, reaching 71.87%, 75.64%, and 62.70% accuracy in MIOU metric, respectively, and generating visually better clearance and coherence of boundaries. |
ACCESS THE FULL ARTICLE
No SPIE Account? Create one