Convolutional neural networks (CNNs) are the mainstream model for extracting rich features in deep learning-driven studies on cloud detection for remote sensing images. However, due to the limitation of receptive fields in convolutional operations, a single CNN has limitations in mining global information and long-range dependencies, which affects the precision of cloud identification in intricate environments. To address the insufficient global feature and long-range dependency modeling of CNNs, a new CNN-Transformer network for cloud detection (CNN-TFCD) in remote sensing images is proposed in this paper. This method comprehensively utilizes strengths of both CNN and transformer to fully mine local spatial details, global contextual information, and long-range dependencies, to obtain comprehensive feature representations. The feature cascade mechanism is applied for multi-scale feature fusion, further enhancing the model's feature representation capability. Evaluations conducted on the Landsat-8 dataset demonstrate that the CNN-TFCD method achieves accurate and stable cloud detection.
|