We propose a dual-branch Siamese network for visual object tracking. Our network architecture comprises two distinct branches: a shallow network branch and a deep network branch. The shallow network branch focuses on precise object localization and improving resistance to interference from similar objects. Meanwhile, the deep network branch emphasizes capturing abstract semantic features of the object. To enhance localization accuracy, we integrate a multi-scale KFFM into the shallow network. In addition, we leverage the attention mechanism to further enhance the model’s robustness. Through extensive experiments on three publicly available datasets, we demonstrate that our method surpasses state-of-the-art tracking algorithms in terms of performance and accuracy. The source code of this work is available online at https://github.com/mbgzwn/SiamDUL.git.
In this paper, we propose a high-resolution GAN model for image dehazing in icing meteorological environment, which strictly follows a physics-driven scattering strategy. First of all, the utilization of sub-pixel convolution realizes the model to remove image artifacts and generate high-resolution images. Secondly, we use Patch-GAN in the discriminator to drive the generator to generate a haze-free image by capturing the details and local information of the image. Furthermore, to restore the texture information of the hazy image and reduce color distortion, the model is jointly trained by multiple loss functions. Experiments show the proposed method achieves advanced performance for image dehazing in icing weather environment.
Object detection based on computer vision is becoming popular in drone-captured images. However, real-time object detection in unmanned aerial vehicle (UAV) scenarios is a huge challenge for low-end devices. To deal with the problem, we have improved YOLOv3-tiny in the following aspects. First, the label rewriting problem, which is caused by network structure and dataset of YOLOv3-tiny in drone-captured images detection, is very serious. The method of increasing the size of the predicted feature map is used to reduce the ratio of label rewriting. Second, the features of small targets will be reduced in a small feature map, but the context information with large receptive fields in it can improve the performance of small target detection. So we use dilated convolution to expand the receptive field without reducing the size of the feature map. Third, multiscale feature fusion is very helpful for small target detection. The multidilated module is adopted to merge features in earlier layer and deeper layers. Finally, a pretraining strategy combining copy-paste data augmentation method is proposed to learn more features from categories with a small number of samples. We evaluated our model on the VisDrone2019-Det test set. It achieves compelling results compared to the counterparts of YOLOv3-tiny, including ∼86.1 % decline in model size, increasing ∼19.2 % AP50. Although our model is slower than YOLOv3-tiny, it is 2.96 times faster than YOLOv3. The results of experiments verify that our network is more effective than YOLOv3-tiny. It is more suitable for UAV object detection applications on low-end devices.
YOLOv4-tiny is a lightweight network designed for low-end devices. It proposes a global model scaling technology, which uses the same scaling method with few convolution filters in each stage of the network, resulting in a small receptive field and low accuracy. To solve this problem, we use different scaling techniques in shallow and deep network stages. For the shallow network stage, the Simple-StemBlock scaling module is proposed to simplify network based on factors such as FLOPs and network fragmentation. The module can effectively reduce computation and improve the diversity of features. For the deep network stage, we consider depth of network and hardware resource constraints, the Depth-CSPBlock scaling module is designed to expand receptive field while keeping the computation as low as possible and layered residual connection is built in the module to enrich semantic information. Besides, mish activation function is adopted in the backbone for higher accuracy. The experimental results show that the accuracy of the proposed method achieves 23.2% AP on MSCOCO test-dev and 66.73 % AP50 on VOC datasets, compared with YOLOv4-tiny, the AP and AP50 increased by 3.3% and 3.7%, respectively. The speed of the proposed method on Jetson TX2 can reach 45.6 frames per second, which is 20% faster than YOLOv4-tiny.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.