Detecting small objects in unmanned aerial vehicle (UAV) pictures is a vital but demanding task in the field of computer vision due to factors such as object size, image degradation, and computational resource limitations. Most existing algorithms improve object detection accuracy by improving feature fusion structures and incorporating single-layer attention modules. However, the weight of single-layer attention is computed only based on the original input, which leads to insufficient exploration of correlations among cross-layer features. To remedy the above issues, we design a novel multi-scale UAV image object detection model called CDANet. First, to fully utilize the scattered detail features in the shallow layers, a cross-layer attention module named the enhanced guided attention module is designed. Second, a simplified depth-wise separable dilated spatial pyramid pooling-fast module is built to overcome the shortcomings of max-pooling operation during feature fusion. Third, an efficient coordinate guidance module is designed as the basic module for guiding the inter-channel information interaction. Fourth, a cross-spatial learning encoder module is designed to encode cross-channel information with an efficient multi-scale attention module. Last, to optimize the learning process by dynamically regulating the attention toward diverse objects, an advanced loss function Wise-IoU v3 is introduced. The results show that, on the VisDrone2021-DET dataset, the average precision and mean average precision of the proposed method are superior to YOLOv8, which are respectively improved by 3.9% and 4.9%. |
ACCESS THE FULL ARTICLE
No SPIE Account? Create one