Two central problems often faced when shooting at high altitudes from UAVs are many small and dense targets and complex background noise interference. In YOLOv5, due to multiple downsampling, the feature representation of small targets becomes weaker and may even be masked in the background. The Feature Pyramid Network (FPN) diminishes the detection accuracy of small targets due to its basic feature concatenation method, which underutilizes multiscale information and introduces extraneous contextual details. To solve the above problems, we propose a simple and effective improved model called multiscale channel interactive spatial perception yolov5 (MCISP-YOLOv5). First, we design a multiscale channel interaction spatial perception(MCISP) module, which recalibrates the channel features in each scale by interacting with information from different scales, facilitates the information flow between the shallow feature geometric information, and the more profound feature semantic information, and uses adaptive spatial learning to realize spatial perception so that the model can focus on the foreground objects better. Second, we replace the traditional up-sampling module with the Content-Aware ReAssembly of Features (CARAFE) operator for feature extraction, which enhances the feature characterization ability after multiple down-sampling, better recovers the detailed information. Finally, we added an additional, shallower depth feature map as the detection layer in YOLOv5. The supplementary feature map enhances the detection efficacy for small objects without adverse effects on the detection capabilities for other sizes of targets. Extensive experiments on the publicly available VisDrone2019 dataset show that the introduced model exhibits substantial enhancements in performance.
With the rapid development of deep learning, the task of image super-resolution has made significant progress. However, as model depth increases, training becomes more difficult and fails to capture coarse-grained and fine-grained information simultaneously. To solve these issues, we propose Deep Hierarchical Multiscale Attention Networks (DHMA). First, we use a residual nested residual structure to improve the propagation of gradient information to address the challenge of training deep networks, which consists of multiple residual groups, each composed of multiple residual blocks. In addition, we propose a Hierarchical Multiscale Attention (HMA) module to capture both coarse-grained and fine-grained features in deep networks. This method imitates how humans observe things, first focusing on salient objects and then observing details. Specifically, we first segment the feature map horizontally into several parts and then perform Global Aware Attention (GAA) learning on each part. Where GAA can learn global structural information. Next, we use the Adaptive Feature Fusion (AFF) module to fuse the learned information of each part into the current layer's features. Finally, we stack the features of each layer to construct a hierarchical multiscale structure and obtain features of different scales. HMA is a lightweight plug-and-play module that can be applied to existing models. Extensive experiments demonstrate the effectiveness and outstanding performance of the DHMA.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.