PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This PDF file contains the front matter associated with SPIE Proceedings Volume 13256, including the Title Page, Copyright information, Table of Contents, and Conference Committee information.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Computerized Positioning and Target Detection Technique
Considering the challenges associated with robots in optoelectronic imaging applications, typically require real-time and accurate recognition and localization of targets, especially in complex environments. Due to the potential challenges of noise, blurring, or dense targets in the images obtained by optoelectronic imaging systems, traditional object detection algorithms may face issues such as semantic information loss and difficulty in detecting small targets. By introducing residual deformable convolution structures, feature representation can be enhanced in optoelectronic imaging systems, and the shape of the convolution kernel can be dynamically adjusted to adapt to the non rigid deformation of the target, thereby better capturing the detailed features of small targets. Experimental results exhibit that RAFPN enhances the mAP of object detection algorithms RetinaNet, Faster R-CNN, and GFL by 1.2%, 0.9%, and 1.1% respectively on the MS COCO dataset, effectively enhancing the performance of object detection algorithms in optoelectronic imaging.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In recent years, extensive research has been conducted in the field of remote sensing target detection, focusing on various aspects of the model such as the backbone network, neck, detection head, and loss functions, leading to significant advancements. However, a crucial factor influencing detection results is the model's capability to extract features from images, particularly when dealing with remote sensing images characterized by complex features. The Segment Anything Model (SAM), a prominent large-scale model in the field of computer vision, has recently attracted significant attention. It boasts powerful feature extraction capabilities and strong generalization, yielding impressive results across various image types. Nonetheless, its primary application lies in semantic segmentation, rendering it unsuitable for direct application to remote sensing rotated object detection. In this paper, we propose a remote sensing rotated object detection method with fine-tuned segment anything model based on dual-stage adapter, denoted as FSAMDA. We utilize the Adapter to learn specific knowledge for remote sensing object detection tasks, while the Mona Adapter simultaneously enhances its ability to process visual signals and learns specific knowledge for remote sensing object detection. This approach maximizes the powerful feature extraction capabilities of the SAM image encoder.Our proposed FSAMDA method has been validated through numerous experiments, showcasing state-of-the-art performance on two widely utilized standard benchmarks, namely DOTA-v1.0 and FAIR1M-v1.0. Specifically, we achieved scores of 81.35 mAP on DOTA-v1.0 and 48.52 mAP on FAIR1M-v1.0, demonstrating the effectiveness of our approach.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the current most popular field of deep learning, activation function refers to a mathematical function applied to perform non-linear transformation on the input. At present, it is commonly used in the hidden and output layers of feature extraction networks. When activation functions are introduced, the non-linear capabilities of neural networks are enhanced, Meanwhile, the model performs better in terms of expressive power and fitting ability. Inspired by trigonometric functions, an activation function is proposed in this study. Based on the inverse tangent trigonometric function -, namely ATLU, which is the arctangent-based nonlinear activation unit, it is expressed as f(x) = x(arctan(1.09x) + π/2). This activation function can be applied to address the gradient vanishing and exploding in deep neural networks. It is validated on the cifar100 dataset with the VGG11 network as the benchmark.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Dataset distillation is often used to create compact datasets that can be used to achieve similar training performance, making it a good choice for addressing the challenges of data storage cost and training cost. However, existing distillation method are generally time-intensive and computationally expensive, especially when applied to vision-language tasks. To address this challenge, we propose the Clustering BiTrajectory Matching method, which accelerates existing distillation techniques by 8 times through two innovative strategies: a clustering-based sample selection and a biTrajectory optimization approach. The Clustering BiTrajectory Matching method can achieve good accuracy in a multi-modal setting while requiring lower computation resources and emphasizing efficiency in pre-training. We evaluate the proposed method on the Flickr8k dataset. We show that our method is able to achieve better efficiency (less iteration to achieve target accuracy), while outperforming other coreset selection methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
For current deep learning target recognition models, the performance significantly decreases when the training data and application data have different feature distributions. This issue becomes more severe, especially in low-resolution scenarios. This paper proposes an infrared-visible cross-domain target recognition method based on an improved CycleGAN to enhance the recognition accuracy of the classifier trained with infrared images for visible images under low-resolution conditions. Firstly, we construct an image enhancement module aimed at enhancing the details and overall quality of low-resolution images. Secondly, the CycleGAN network model is improved with a dual-discriminator adversarial structure. The improved CycleGAN is utilized to transform visible images into infrared images and reduce the feature gap between the two image types. Finally, we conduct cross-domain target recognition experiments using the DSAIC dataset and compare our method with several other approaches. The results demonstrate that the proposed method effectively enhances the accuracy and generalization capability of the infrared classifier for visible light targets, while retaining high accuracy and generalization capability for infrared targets.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In traditional water garbage cleaning vessels, there are various problems such as large size, difficult control, limited functions, and the need for human labor. Additionally, in traditional vessel obstacle avoidance algorithms like A* algorithm and DWA algorithm, there exist issues of low efficiency, redundant path planning, and collision risks. In the context of unmanned vessel path planning for garbage cleaning, an improved and integrated DWA algorithm is proposed to address the problem of ineffective collision avoidance when encountering dynamic and static obstacles. The simulation results show that the proposed algorithm can effectively determine the collision avoidance responsibility of unmanned vessels in dynamic and static situations. It reduces the steepness of speed changes during navigation and the planned path improves the safety of unmanned vessel operations, increases the speed of garbage cleaning, and reduces the required time.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Traditional rural building management mostly relies on manual work, which is not only inefficient, but also easy to cause security risks. On this basis, this paper plans to integrate the building information based on computer vision technology into the map data, and realize the overall planning and safety management of the village by combining with real-time data such as meteorology and geology. Through the identification and safety evaluation of rural houses, an intelligent early warning system is constructed to realize early warning and risk disposal of villages and towns, thus improving the level of rural disaster prevention and mitigation and safety management. The F1 values of farmhouses, granaries and public facilities are 91.2%, 87.2% and 91.1% respectively, and the overall F1 value is 89.8%, which shows that the model performs well in balance accuracy and recall. Using computer vision technology, we can realize automatic identification and safety evaluation of massive buildings and improve management efficiency.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the leather and textile industry, assessing the degree of black and white in dyed samples is one of the most challenging tasks. Current methods for characterizing blackness typically adopt methods describing whiteness, including “CIE wavelength method”, “absorbance comparison method”, “ISO whiteness measurement method”, etc., where lower whiteness implies higher blackness. However, these methods lack consistency between quantification and human perception, especially at low levels of whiteness. Through the research on the Munsell color system, this paper reveals a stable correspondence between the blackness value of dark object and the sum of their spectral reflectance. Based on this finding, we measured the spectral reflectance of 324 low luminance colors in the Munsell system and analyzed the relationship between reflectance and the blackness value. Using the Levenberg-Marquardt interpolation algorithm, a model was established between reflectance and blackness, with a standard deviation range of 0.23-1.10. Validation was conducted on dyed leather samples, demonstrating that the results align with visual matching using Munsell color chips. Due to its numerical continuity, the results hold more quantitative comparative value than the Munsell system alone, thus proving the applicability of this model in characterizing object blackness.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the process of workpiece contour measurement, when the surface structure of the workpiece is too complex or the texture information is insufficient, it is often difficult to collect enough workpiece feature information, which increases the difficulty of subsequent feature matching. Therefore, this paper investigates a structured light coding technique, which can increase the texture data on the appearance of the workpiece and reduce the complexity of later matching. It centers on projecting specific patterns onto the surface of the workpiece to be measured, and these patterns carry unique information codes. Because these projections contain special data formats, they can be parsed and decoded with the help of a computer program. In this paper, a combination of Gray code and four-step phase shift is used for encoding. The absolute phase information thus generated can provide new matching primitives for stereo matching between images later. At the same time, in order to improve the decoding efficiency and accuracy, this paper proposes a number of processing methods including image denoising, background phase removal, periodic misalignment correction, etc. Finally, the simulation experiments are carried out through the MATLAB software to verify that the obtained image correction effect is good.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the unique marine environment, sonar, as a crucial tool for navigation and ranging using sound waves, plays a key role in the recognition and localization of targets in sonar imagery, which is critical to the performance of underwater equipment. In practical applications, fully supervised object detection methods are commonly used to accurately determine the categories and locations of targets in sonar images. However, these methods require laborious individual annotation of target positions and categories. Addressing this challenge, this paper proposes a weakly supervised target localization method for sonar imagery based on Grad-CAM technology. Initially, the method employs the Balanced Ensemble Transfer Learning (BETL) algorithm for target classification in sonar images, utilizing deep transfer learning. Subsequently, the GradCAM technique is applied to the task of sonar image target classification, using the generated heat maps for target localization. This aims to enhance the accuracy and efficiency of sonar image recognition and localization while reducing the burden of annotation work.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
After a series of achievements in the field of natural language processing, Transformers have shown promising results upon their introduction to computer vision, particularly under conditions of large-scale data. However, when faced with insufficient data, the performance of Vision Transformers(ViT) often falls short compared to Convolutional Neural Networks (CNNs), which are capable of capturing intrinsic biases in the data. In this paper, to address the performance disadvantage of ViT on small datasets, we propose a two-stage self-supervised training strategy. We enhance the ViT model by introducing Sequential Overlapping Patch Embedding (SOPE) and Improved Dynamic Aggregation Feed Forward (IDAFF) modules. Applying our approach to both single-block and multi-block ViT models on five commonly used small datasets (CIFAR10, CIFAR100, CINIC10, SVHN, Tiny-ImageNet) shows the efficacy of our approach. It narrows the performance difference between ViT and CNNs during training on small datasets from the ground up, and in some cases, even achieves better classification performance than CNNs. Our codes are available at: https://github.com/newer7/vosd.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the rapid advancement of computer vision and artificial intelligence technologies, target detection has garnered increasing attention. Small target detection remains a significant challenge in the field due to factors such as the limited proportion of small target pixels, sparse semantic information, and susceptibility to interference from complex scenes. To address these challenges, we propose a small target detection algorithm grounded in context information enhancement. This approach leverages symbiotic contextual features between "target and scene" and "target and target" to compensate for the intrinsic information deficiencies of small targets. Furthermore, to enhance the differentiation between small targets and background elements, we employ an adaptive margin classification loss function that guides the detection process towards learning robust discriminative features. Experimental results demonstrate that our method outperforms traditional target detection approaches in small target detection scenarios.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This article proposes a Transformer-YOLO model for identifying hidden dangers in construction machinery to address the issues of complex transmission channel environments, diverse forms of construction machinery, and significant changes in target scales. Firstly, A transformer-based prediction head is adopted in YOLO v5. Then, the CNN module is replaced by DCN-V2. Finally, this paper optimized the loss function of the model to enhance its ability to detect small targets. This article uses online monitoring images to construct a dataset and validates the model. The results show that the method proposed in this article can effectively identify and alert construction machinery targets near the transmission channel.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the evolution of augmented reality technology itself, its application fields continue to expand. The characteristics of multi-source augmented reality display methods, natural interaction means, and rich content volume provide a new application model inspiration for traditional information systems. As products advance, the analysis and presentation of product spectra have also put forward new requirements for diverse applications. Based on a comprehensive analysis of the display needs of product spectra, this paper proposes a multifaceted presentation framework for product spectra based on mixed reality technology, which can support the comprehensive display of overall situation, local situation, system capabilities, organization capabilities, and single-equipment capabilities. Compared with traditional display methods, it has more significant advantages.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
How to enhance river vessel target detection for Unmanned surface vessels(USVs) with limited resources has always been a critical technology for the practical application of intelligent ship navigation. However, factors such as waves, lighting conditions, and the overlapping of multiple targets in complex river environments often have a significant impact on the detection and identification performance of river targets, especially small ones. In recent years, YOLOv8, as the latest model in the YOLO series, has been validated for its capabilities in river target detection. To further improve its detection performance in complex river environments, this paper proposes a river unmanned boat target detection method based on improved Yolov8. Due to the complexity of river environments and the interference of multiple targets, accurate detection of river targets is essential for unmanned boats to navigate obstacles and plan subsequent paths effectively. Therefore, improvements were made to the Yolov8 network to enhance the detection of river targets. In the improved Yolov8, AFPN (Asymptotic Feature Pyramid Network) is used in the Neck to strengthen the model's feature fusion capability and reduce the number of model parameters. BRA (Bi-Level Routing Attention) is utilized in the Backbone to better extract features. The improved YOLOv8 network can accurately identify targets in complex river environments and improve the detection accuracy, while various comparative experiments also validate the rationality and effectiveness of the improvements.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the continuous development of artificial intelligence technology, intelligent facility operation and maintenance practice based on visual recognition technology has become one of the research hotspots. Based on the background of intelligent facility operation and maintenance, this study discusses the application of visual recognition technology in facility operation and maintenance. Firstly, through the analysis of the current situation of intelligent facility operation and maintenance, the requirements and challenges based on visual recognition technology are put forward. Secondly, combined with deep learning and image recognition technology, a set of intelligent facility operation and maintenance system is designed, and the practical application and verification are carried out. Finally, through the analysis and summary of the experimental results, the validity and feasibility of the intelligent facility operation and maintenance practice based on visual recognition technology are verified, which provides a useful reference for the research and practice of intelligent facility operation and maintenance field.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The reduced cost and computational and calibration requirements of monocular cameras make them ideal positioning sensors for mobile robots, albeit at the expense of any meaningful depth measurement. Solutions proposed by some scholars to this localization problem involve fusing pose estimates from convolutional neural networks (CNNs) with pose estimates from geometric constraints on motion to generate accurate predictions of robot trajectories. However, the distribution of attitude estimation based on CNN is not uniform, resulting in certain translation problems in the prediction of robot trajectories. This paper proposes improving these CNN-based pose estimates by propagating a SE(3) uniform distribution driven by a particle filter. The particles utilize the same motion model used by the CNN, while updating their weights using CNN-based estimates. The results show that while the rotational component of pose estimation does not consistently improve relative to CNN-based estimation, the translational component is significantly more accurate. This factor combined with the superior smoothness of the filtered trajectories shows that the use of particle filters significantly improves the performance of CNN-based localization algorithms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Depression is one of the most common mental health disorders and has been a major focus of research, particularly through the lens of automated diagnostic methods. While many studies have explored magnetic resonance imaging techniques separately, the integration of multiple neuroimaging modalities has received less attention. To address this gap, we introduce a multimodal automatic classification method that leverages both resting-state functional magnetic resonance imaging and structural magnetic resonance imaging. Our approach employs a multi-stream 3D Convolutional Neural Network model to facilitate joint training on diverse features extracted from rs-fMRI and sMRI data. By classifying a combined group of 830 MDD patients and 771 normal controls from the REST-meta-MDD dataset, our model achieves an impressive accuracy of 69.38% using a feature combination of CSF, REHO, and fALFF. This result signifies a notable enhancement in classification performance, contributing valuable insights into the capabilities of multimodal imaging in MDD diagnosis.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Unmanned Aerial Vehicles (UAVs) pavement distress detection represents a critical task within the domain of highway maintenance. For challenges such as the broad field of view and small target size characteristic of drone-captured images, the complexity of the backgrounds, and the constraints imposed by limited-resource platforms which preclude the deployment of traditional detection models. To this end, we introduce YOLOv8-EHG, a lightweight, real-time UVAs pavement distress detection model, built upon an enhanced YOLOv8 framework. Our approach first integrates Efficient Local Attention (ELA) within a High-level Screening-feature Pyramid Networks (HSFPN) to forge the ELA-HSFPN architecture, replacing the Pyramid Attention Network (PAN) in YOLOv8. Subsequently, we developed a lightweight detection head, Detect-T3G. According to the RDD2022 dataset, this model achieves an mAP50 of 67.4%, a 0.2% improvement over the original YOLOv8. It also reduces the model parameters by 46.9% and computational complexity by 41.9%. These improvements facilitate the deployment of drones for real-time detection of road surface diseases.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In order to improve the accuracy of millimeter wave radar target detection, this study firstly briefly introduces the principle of millimeter wave radar, then describes the target tracking framework and commonly used millimeter wave radar filtering algorithms, and finally obtains the accuracy of target tracking under different filtering algorithms through training and validation. The results show that after the tracking characteristics based on two filtering methods, UKF and PF, UPF filtering is studied on the basis of both, and the algorithm is able to achieve accurate target tracking and pair prediction to ensure particle diversity.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The raw substation equipment point cloud obtained by LIDAR is often partial due to occlusion and limitation of angles and can not provide an adequate data base for 3D reconstruction, shape classification, etc. Point cloud completion aims at estimating the full shape of an object based on partial observation. This paper propose a high fidelity point cloud completion model based the architecture of encoder-decoder. Proposed model gradually generates coarse point cloud and detailed point cloud. At the stage of encoder, a residual module ResMLP is designed using only MLP. There is a pyramid-like structure between modules to extract depth features, which has the ability of deeper network expansion. At the stage of decoder, by combining partial input with coarse point cloud, the enhanced skeleton center points of objects are obtained by extracting key points, which alleviate structural blur of input. Finally, the visualization of experimental results shows that the proposed model can effectively supplement the missing part of the equipment point cloud. Even some objects perform well with 50% missing. The final evaluation metrics CDL1, CDL2 and F-Score reach 10.490, 0.362 and 0.601, respectively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Rapid prediction of the water content of Paddy soil is an important support for the study of the adhesion properties of soil-engaging component. In this paper, near-infrared spectroscopy was used to quickly detect soil moisture content, and 123 near-infrared spectra of soil were collected with the range from 3999.64 cm-1 to 10000.10 cm-1. The moisture content of the soil was measured by the drying method containing 123 samples which were divided into the cali-set and the pred-set according to the ratio of 2:1. The least squares method was used to establish the full spectrum, the segmented spectrum and the characteristic spectrum selected by the genetic algorithm to establish the prediction model of water content. By comparison, 20 characteristic spectra were optimized by genetic algorithm, and the adjacent or even overlapping characteristic spectra were further eliminated. Finally, 10 characteristic spectra were selected to establish the prediction model of red and yellow soil moisture content. The correlation coefficients of the cali-set and the pred-set were 0.9532 and 0.9612, respectively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Aiming at the problem of difficulty in accurately estimating the OFDM channel in wireless communication of high-speed trains, an improved SVD-LMMSE channel estimation method for OFDM systems is proposed. This article first introduces the OFDM system model, and then introduces several typical channel estimation methods: LS algorithm, MMSE algorithm, LMMSE algorithm, and SVD-LMMSE algorithm. Based on the above, a SVD-LMMSE channel estimation method based on improved DCT is proposed. This method utilizes the DCT algorithm and wavelet transform to decompose the channel autocorrelation matrix, achieving effective separation of channel noise components; Then, using singular value decomposition techniques to reduce the complexity of high-frequency noise sequences; Next, reconstruct the effective signal to obtain the denoised signal matrix; Finally, simulation tests were conducted using MATLAB, and the results showed that the error rate and mean square error performance of the channel estimation method were improved to a certain extent.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Crowd counting and localization are two crucial tasks that provide technical support for crowd analysis. In recent years, P2PNet has emerged as a mile-stone work in this field, presenting an end-to-end framework that combines these two tasks and exhibits strong performance. However, it has been observed that solely utilizing CNN to predict the category of reference points neglects the influence of the surrounding environment. To address this issue, we model the reference points as a graph, with each reference point connected to other reference points within a neighborhood range. We employ GCN to aggregate the confidence of reference points, thus incorporating important contextual information. Our method is straightforward to implement, requiring only a slight increase in model parameters, and it is plug-and-play, allowing for easy integration into other P2PNet-like methods. Additionally, in order to assess the localization performance more precisely, we devise a new metric called Normalized Mean Offset(NMO). Our method, namely CA-P2PNet, is evaluated on multiple public datasets. The results consistently surpass other baselines, thus the State-of-the-Art(SOTA) performance of our model is demonstrated.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Remote sensing images have a wide range of applications in geological exploration, disaster warning, military reconnaissance and other fields, and the detection of specific targets in remote sensing images can improve the efficiency of image analysis. However, due to the high shooting height of remote sensing images, the target pixel area obtained is small, and the complex background of the image is unfavorable for detection. To address this problem, a practical feature extraction network for remote sensing images is designed and transplanted into the YOLO v7 algorithm. Firstly, a multi-scale nested Vision Transformer model is proposed, compared with the standard Vision Transformer model, which can simultaneously compute multiple attention to multiple feature maps at different scales and merge the multi-scale features to enhance the perception of the Vision Transformer model for the global features; secondly, a fission type multi-field convolution model is proposed, which is a multi-field convolution model. fission multi-perceptual field convolution module, which enhances the processing capability of local features by grouping feature maps for computation; finally, a multi-level feature extraction network is designed to combine global and local features in remote sensing maps to enhance the feature richness of the network. The related experimental tests are carried out on three datasets, including RSOD, and the experimental results show that the designed algorithm improves the F1-Score index by 3.65% on average compared with YOLO v7, and improves the mAP index by 3.53% on average compared with YOLO v7.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Network Information Recognition and Image Processing
With the continuous advancement of deep learning technology, text-generated images have emerged as a prominent research area. This paper proposes a deep learning-based approach for text image generation, utilizing a generative adversarial network (GAN) model to effectively convert textual descriptions into corresponding visual representations. Specifically, this study introduces a text-image fusion module that comprehensively integrates global and local features, ensuring the coherence of generated images. Experimental validation and comparison with other prevalent methods substantiate the exceptional performance, practicality, and potential applications of the proposed methodology.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the field of computer vision, the ability to accurately detect and recognise animal features in various environments is an area of growing interest and application. This study presents an advanced cat face detection system utilising the You Only Look Once Version 8 (YOLOv8) architecture, enhanced with TensorRT optimisation for real-time processing. The approach involves a comprehensive data augmentation process to improve detection accuracy across diverse cat breeds and environmental conditions. Performance evaluation is based on quantifiable metrics; the optimised model achieves a notable reduction in inference time from 50.1ms to 0.9ms and a decrease in GPU power usage from 77 watts to 63 watts, without compromising accuracy. The accelerated processing speed and reduced power requirements make the system highly suitable for real-time applications, such as pet monitoring systems or behaviour analysis tools, where rapid and accurate detection is paramount. The research highlights the potential of deep learning algorithms in precise animal feature recognition and contributes to the field of computer vision by addressing challenges in small, diverse object detection.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper found that in existing cattle face recognition algorithms, convolutional neural networks (CNN) are more commonly used. The processing speed and performance of CNNs in processing large data are often limited by the receptive field and the number of network layers. At the same time, the visual transformer has achieved great success due to its high performance and excellent ability to process large data. Therefore, this paper borrows the traditional cattle face recognition process, using the DETR algorithm for object detection and the Swin-transformer algorithm for image classification, thereby improving the running efficiency and enhancing the ability to process large data. Comprehensive experiments show that the algorithm in this paper is superior to existing traditional models, achieving a good balance between speed and accuracy. Compared with traditional algorithms, this algorithm utilizes the emerging Transformer group, and not only can effectively identify the unique identity of cattle's front face, but also has good effect on identifying the unique identity of cattle's side face and rear face due to using the Transformer group to train and test a large amount of data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Bundle adjustment is the core of the Structure from Motion algorithm, and it is also a very time-consuming part, in which redundant observations and initial parameter values with large errors increase the time consumption of the algorithm. In order to improve the efficiency of bundle adjustment, we propose an track selection method based on uniformity, accuracy, coverage and connectivity criteria. Firstly, we divide the space of tracks into several 3D grids. Secondly, we start from the grid with the largest number of tracks, and eliminate the redundant tracks with low connectivity and low accuracy in each grid, while ensuring the connectivity and coverage of tracks. Finally, we use a variety of experimental data to verify this algorithm. The result shows that the algorithm can delete a large number of redundant tracks, and effectively improve the efficiency of the bundle adjustment method on the premise of ensuring the accuracy. When the track retention rate is 0.4, the efficiency of the bundle adjustment method is increased by about 2 times, and the corresponding precision loss value is 0.026.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we propose an underwater synchronous localization and mapping algorithm based on image information enhancement, which can reduce the problems of image blurring and contrast reduction caused by environmental factors by processing underwater images, and improve the performance of visual odometer in underwater environment. On the basis of image processing, the acoustic sensor with less influence on underwater environmental factors is introduced to help the system to carry out target scale recovery and absolute distance detection, strengthen the tracking and matching of feature points, and improve the accuracy of state estimation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the continuous development of remote sensing imagery in deep learning, this paper proposes a self-attention model called Dual-Stream Swin Transformer to address the computational and memory requirements issues traditional Transformers face when dealing with high-resolution images. Specifically, 1) The Dual-Stream Swin Transformer in this paper adopts an innovative approach by decomposing the traditional Transformer encoder layer into smaller building blocks and introducing a shifted windows mechanism to construct self-attention. 2) Traditional Transformer models require significant computational and storage resources when processing high-resolution images because they perform self-attention calculations on the entire global image. However, the Swin Transformer significantly reduces the computational and memory requirements by segmenting the image into multiple spatially overlapping windows and performing self-attention calculations within each window. 3) Furthermore, the introduced shifted windows mechanism ensures that the attention of each window is only related to its adjacent windows, further reducing the computational complexity. This decomposition and window mechanism combination makes the Swin Transformer ideal for handling high-resolution visual inputs. It achieves high precision while offering higher computational efficiency and lower memory consumption. This makes the Swin Transformer perform excellently in image classification, object detection, and semantic segmentation tasks. We conducted comparative experiments between this model and other classical network models of the same type. The Dual-Stream Swin Transformer in this paper effectively addresses traditional Trans-formers' computational and memory challenges when handling high-resolution images through innovative decomposition and window mechanisms, providing a new solution for efficiently processing large-scale visual data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
To solve the problems caused by insufficient training samples in few-shot image recognition, such as low accuracy and slow speed, an improved siamese network model is designed in the paper. Based on peculiar siamese networks, this paper selects lightweight convolutional neural network MobileNet V2 which is pretrained by transfer learning as the feature extraction part. Meanwhile, a new activation function is designed in this paper. The results of comparative experiments show that the improved siamese neural network can enhance the speed and accuracy of recognition for few-shot datasets.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Aiming at the improper rock climbing movements may cause sports injuries, and the traditional sports motion capture analysis method is insufficient in real time, the accuracy of movement evaluation is low and other problems, this paper applies the pose recognition technology to rock climbing, and researches on the recognition and standard degree evaluation of rock climbing movements. Firstly, a rock climbing action detection algorithm based on improved YOLOv8-Pose is proposed, MobileNetV3 is used as the backbone feature extraction network of the model to realize the lightweight of the model, and the attention mechanism CBAM module is introduced to pay attention to the important channel information in the network, so that more effective feature information can be extracted from the channel and spatial dimensions, and the model recognition accuracy can be improved; then, for the the characteristics of rock climbing action, eight joint angle indicators are selected, and the degree of standard rock climbing action is evaluated using action similarity. Experiments show that: the size of the improved detection model in this paper is reduced by 44.07%compared with the original model, and the number of parameters is also reduced by 51.54%, and the results have a better differentiation when comparing the standard action and the action to be tested by using the action similarity, thus verifying the effectiveness of the algorithm. Thus, the validity of the algorithm is verified.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The news video contains a lot of effective information, and the top task of further analyzing the news is to conduct layout analysis. It has very important research value in the field of news video analysis. The news media is an important medium for spreading important information, and the information spread by the news media has a significant impact on ordinary people. At present, all countries are actively developing their own news media to grasp the initiative of news. Layout analysis and recognition technology based on the deep learning is becoming more and more perfect. Compared with the traditional rule-based layout analysis algorithm, the layout analysis algorithm based on deep learning can filter redundant or useless data features, thereby obtaining and utilizing information better. In this paper, We innovatively propose NLNet –a noval neural architecture for news video layout analysis that is resource-efficient yet generalizable and scalable, formulated the category standards of news video layout elements, and proposed a news image layout analysis dataset called NewsLayout, containing 20000 news images. The whole architecture adopts a lightweight backbone network design, uses a lighter parameter, less computation, and more reasonable structure of the lightweight design to ensure accuracy and recall rate, improve the inference speed of the network.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the field of image recognition, Convolutional neural networks (CNNs) and Fully connected neural networks (FCNNs) are two commonly used deep learning models. For the handwritten digit recognition task, the researchers compared the performance of the two methods on the MNIST dataset. The performance of these models of two neural networks performing the same task on the same dataset will be included in this article. In this paper, the construction and operation process of a fully connected neural network and convolutional neural network are theoretically analyzed. The researchers used the MNIST handwritten digits dataset for their study. The results show that the convolutional neural network is superior to the fully connected neural network in recognition accuracy and time consumption. This is mainly attributed to the advantages of convolutional neural networks that can better retain spatial structure information, reduce the number of parameters, and share weights when processing image data. In contrast, fully connected neural networks need to process many parameters, and is difficult to effectively extract image features. Therefore, when using the MNIST dataset as an image recognition task, researchers believe that using convolutional neural networks is a more effective and efficient method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the proliferation of face recognition technology, facial recognition in mobile applications has become commonplace. However, the lack of a liveness detection module may render the system vulnerable to spoofing attacks, potentially compromising user privacy or causing financial losses. Research indicates that deep learning-based liveness detection methods are a viable option, particularly when operating in real-time on mobile devices. MobileFaceNet is a convolutional neural network tailored for mobile devices. In this paper, improvements to the fully connected layers and residual modules of the network are proposed. Using Fourier spectrum images as auxiliary supervised learning targets for network branches, the accuracy of liveness detection is enhanced.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
TACE is the main non-surgical method for the treatment of intermediate and advanced liver cancer, and during the surgery, the surgeon determines the guidewire position by DSA image guidance. Due to the thin guidewire and complex image background, in this paper, GWnet network is proposed to enhance the guidewire segmentation accuracy. The network first facilitates the model's extraction of elongated tubular features of the guidewire by replacing the conventional convolution with a strip pooled pyramidal dispersed attention module (SPSA); secondly, hollow space convolutional pooling pyramid module (IASPP), which incorporates the inverse bottleneck layer structure, is embedded into the network to expanding the model sensing area while reducing feature loss.; Ultimately, by introducing a coordinated attention mechanism to reduce the effect of background noise and improve the resolution of the model to recognize guidewire targets and their edge details. Experimental results on 4839 anonymized DSA contrast images show that the guidewire segmentation method achieves good results.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Mixed reality (MR) technology is a combination of computer technologies that introduce real-world information into virtual environments and enable interaction between the two within a unified visual space. As a typical application form of MR technology, mixed reality sandbox applications significantly enhance the ability for three-dimensional situational awareness. Human-computer interaction (HCI), as an essential component of MR technology, directly affects the user experience of MR technology and has been a hot topic of research both domestically and internationally in recent years. Introducing gesture interaction into mixed reality sandbox systems can well meet users' needs for continuous observation and editing of three-dimensional objects, providing a real-time and natural interaction experience for users. This paper focuses on mixed reality sandbox applications, discusses the design principles and evaluation methods for human-computer interaction oriented towards mixed reality sandboxes, proposes a set of general non-contact gesture interaction rules, investigates the cross-critical technologies relied upon in interaction methods within mixed reality, summarizes the issues that need to be addressed in the next phase of research for mixed reality sandbox applications, and finally provides a conclusion and outlook for the study of interaction methods in mixed reality sandbox applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This article adopts the bidirectional GRU algorithm to set up classifiers, which is mainly used in the image field and is an upgraded version of LSTM deep learning algorithm. Its performance and accuracy have been improved compared to LSTM. After consulting information, the LSTM algorithm has achieved excellent results in the field of text classification, and the actual results in this article can also achieve an accuracy rate of about 85%. Moreover, due to the inherent characteristics of the algorithm, the larger the data volume, the higher the processing accuracy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the intricate environments of steel mills, traditional target detection algorithms relying on a single RGB image can hinder smoke detection. This is due to significant light variations under diverse lighting conditions, the evolving morphology of smoke, and the presence of steam, sparks, and similar gases in the surroundings. With no dedicated algorithm for identifying smoke with its distinctive morphology and color, as produced in steel mill operations, this paper introduces a novel approach inspired by the structure of YOLOv9, termed DSAIF-Net—a dual-stream information network. DSAIF-Net leverages adaptive feature fusion from RGB and optical flow images to detect smoke amidst complex lighting and environmental challenges.To enhance feature fusion—both static and dynamic—a weighted bidirectional transfer feature adaptive fusion module (ROBFM) is proposed. ROBFM intelligently merges the original RGB image and the optical flow image, elevating model accuracy while reducing misidentification and false alarms. Additionally, to streamline model parameters and computational load without compromising accuracy, a lightweight attention mechanism (EMiRMBA) is incorporated. To validate the model's efficacy, we curated a comprehensive dual-stream information smoke detection dataset from surveillance videos across various steel plant scenarios. The results show that our proposed method achieves superior detection accuracy with the lowest dataset requirements, with an accuracy of 0.960, a recall of 0.908, and a more complete coverage of prediction frames, as well as the lowest miss and false detection rates. Moreover, the size of our model is reduced to 85% of the original network model, which affirms its efficiency and scalability.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In response to the issue that existing human action recognition models can not make full use of complementary information from different modalities, this thesis proposes a multi-path attention module MA to form the MA-GCN model. modalities, this thesis proposes a dual-stream human action recognition model SRHAR that fuses skeleton data and RGB data. This model utilizes LAF proposed in this thesis to fuse skeleton features and RGB features. The introduction of skeleton modality enables the RGB modality to obtain the RGB features. The introduction of skeleton modality enables the RGB modality to obtain complementary information, resulting in more accurate prediction results. This algorithm focuses more on recognition accuracy and has a slower recognition speed, but achieves leading performance in terms of accuracy on public datasets.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In modern cities, as the demand for transportation increases, the number of vehicles continues to rise, thereby increasing the demand for multi-level parking garages. Unlike traditional open-air parking lots, multi-level parking garages need to consider the weight of different vehicles and require classification management of vehicles. Traditional weighing sensors typically rely on large-scale equipment and complex construction, resulting in higher costs.This paper takes a purely visual approach and utilizes ResNet18 to construct a vehicle classification method that divides vehicles into six categories. This method serves as an auxiliary system for multi-level parking garages, achieving purely visual processing of tasks that previously required multiple sensors. In the final training evaluation, our approach demonstrates 97.3% accuracy, 95.5% precision, 95% recall, and 96.3% F1 score. Additionally, we use TensorRT to accelerate the inference process, keeping the inference time within 2 milliseconds, enabling rapid inference that can be completed without parking.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Nearest neighbor queries have widespread applications in fields such as computer vision, data mining, pattern recognition, document retrieval, and geographic information systems.The traditional k-nearest neighbor algorithm based on branch-and-bound and octree requires the use of a priority queue to maintain the result set. However, in large-scale data scenarios, the efficiency of the priority queue decreases, leading to a decline in query efficiency. Additionally, implementing parallelism using tree-based storage structures, such as octree, can be challenging. This paper proposes a parallel k-nearest neighbor query algorithm for multi-core CPU environments based on octree. This method adopts a divide-and-conquer approach by dividing the dataset into multiple parts and constructing octree for each part. Then, it utilizes parallel max-heaps to calculate the distances of pruned data and selects the nearest K elements to achieve K-nearest neighbor query. This method utilizes multiprocessors for tree construction and result set maintenance, further accelerating the query speed. Experimental results show that in large-scale data scenarios, this method achieves significant acceleration compared to sequential query.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the continuous advancement of smart city construction, autonomous driving technology is playing an increasingly important role in urban traffic systems. This study aims to explore the development and optimization of traffic sign recognition algorithms for autonomous vehicles in smart city traffic environments. By comprehensively analyzing current traffic sign recognition technologies, this paper proposes a traffic sign recognition system based on the YOLOv5 algorithm and utilizes the open-source COCO dataset for model training and testing. The images were preprocessed and annotated, employing CSPDarknet53 as the backbone network, which effectively extracts image features through multiple convolutional layers and residual blocks. A deep learning model was trained, capable of recognizing and classifying various traffic signs such as stop, speed limit, and turn signs. The research results indicate that the model demonstrates high average precision (AP) on the COCO dataset, effectively identifying traffic signs of different sizes and angles, even in complex backgrounds, with high robustness. Compared to traditional methods, the recognition accuracy has improved by 15%, and it has a significant advantage in real-time processing.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the context of deep learning applied to single-image super-resolution, the quality of the reconstructed images is largely contingent upon the intricacy of the convolutional neural networks employed. However, this complexity is limited by the static nature of the receptive fields within these networks. Transformers, distinguished by their self-attention mechanism, are capable of capturing global dependencies, a feature that is beyond the reach of conventional CNNs. However, integrating Transformers into CNN architectures poses a challenge. This paper introduces an innovative solution, a Multi-Dimensional Feature Fusion Image Super-Resolution Network that harnesses Transformer's global representation capability. By utilizing the self-attention mechanism, our network achieves effective cross-feature extraction and establishes global dependencies throughout the feature map. A dedicated Multi-Dimensional Feature Fusion Module is employed to enhance feature fusion, further improving the reconstruction quality. Empirical evidence from experiments on the Set5 benchmark dataset reveals that our network outperforms existing state-of-the-art methods by 0.22dB in 4× super-resolution tasks, highlighting the efficacy of our approach.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposes a matched filtering edge detection method to overcome the weaknesses of traditional edge extraction methods, such as poor noise resistance, incomplete targets, and missing location information. Through this method, we can more effectively extract the edges of the target image, thereby obtaining extraction results with stronger edge-focusing ability.This method can effectively suppress the background, improve the integrity and accuracy of edge extraction.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the wide application of deep learning in various fields, the computing performance and computing cost of models are required to be higher. This paper introduces some methods to improve the computational performance and reduce the computational cost of deep learning models, including pruning and mixed precision training. We verify the effectiveness of these methods through experiments, and discuss their advantages and disadvantages and application scenarios. The experimental results show that the pruning mixture can not only improve the precision, accuracy and efficiency of the training model, but also improve the performance of the model on multiple data sets, which provides a valuable reference for training and optimization of more complex deep learning models that require a lot of computing power.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In practical applications, mobile robots or UAVs often need to navigate and locate in a dynamic environment, but traditional SLAM algorithms often perform poorly in the face of dynamic environments. Therefore, dynamic SLAM has become one of the research hotspots. In order to solve the problems of low positioning accuracy and poor robustness of traditional visual SLAM in dynamic scenes. In this paper, an improved algorithm based on ORBSLAM3 is proposed.Under the condition of keeping the original framework unchanged, the algorithm adds a new semantic thread, combines Mask R-CNN to segment the image frame, extracts the keyframes for optimization, removes the feature points of the dynamic object, and retains the feature points of the original static object. Finally, comparative experiments are carried out based on the TUM dataset, and the final results show that the proposed algorithm is superior to the existing algorithms, and the positioning accuracy and robustness are improved.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Semantic segmentation is often used in robots' environment perception and object recognition, which can help robots better understand the surrounding environment and perform correct tasks. In the semantic segmentation of football robots, the semantic segmentation model based on the U-Net model works better in the context of football robot competitions, but there is still room for optimization. In response to the special situational needs of football robots, this paper carries out optimization measures such as data enhancement, adding an attention mechanism, and changing the physical mark extraction model based on the U-Net model, and the overall performance has been improved. On the official data set provided by RoboCup SPL, the accuracy rate finally reached 96.25%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
To address the matching problem caused by the salience differences in spatial features, spectrum and contrast between heterologous images, a heterologous image matching method based on salience region using Q-test and kernel density estimation is proposed in this paper. Firstly, the center pixel point of each sub-region in the image is detected for salience difference with other points in its neighborhood by using the Q-test method. The detected points are defined as Q-points. Secondly, the point with the largest kernel density in all Q-points is calculated by using kernel density estimation function and defined as the M-point. Then, the region size is defined according to the coordinates of M-point, and the salience region is detected. Finally, to comprehensively evaluate the performance of the proposed method, five common types of heterologous images are selected as data sources, including point cloud depth maps, infrared images, electronic navigational maps, synthetic aperture radar images, and night-time light images. Based on the salience regions, comparative experiments are carried out using histogram of absolute phase consistency gradients (HAPCG) and histogram of the orientation of the weighted phase descriptor (HOWP). The results show that the matching performance of the proposed method is better than that of the original algorithm without salience region, especially on the type of electronic navigational map.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This study aims to leverage advanced deep learning technologies to facilitate high-accuracy waste classification on mobile devices. We developed a deep neural network model based on ResNet-34, specifically designed for the automatic identification and classification of recyclable waste images. Utilizing a residual learning framework, this model enhances the representational capacity of feature maps, effectively captures deeper features, and maintains information flow, thus preventing the common issue of feature degradation during deep network training. Testing on the TrashNet dataset demonstrated that this model surpasses other common convolutional neural network architectures in multiple performance metrics, including accuracy, precision, recall, and F1 score, achieving a classification accuracy of 86.25% and confirming its efficacy in handling complex waste classification tasks.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Real image editing has been one of the important research topics in the field of image generation, with the proposal of diffusion model, real image editing based on the guided diffusion model has become the mainstream, which edits the original real image through different conditions (text, image, sketch et al.), and outputs the edited image that align well with input conditions. Among them, the image-based real image editing task often can only migrate the style of the reference image to the original image, which is not controllable enough and lacks the ability of fine-grained image translation. In this paper, we propose I2IP, which introduces text prompt in the image translation process to control the image translation process in a fine-grained way, to improve the generalization and refinement of the image translation algorithm. Compared to text-based real image editing, I2IP can achieve image style transfer with reference image, while compared to image-based real image editing, I2IP takes the advantage of text-based image editing and control the editing result.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Tunnel images are affected by the shooting environment, and there are problems such as uneven light distribution, local occlusion, and more noise, etc. Aiming at the overexposure and distortion of the existing image enhancement algorithms in the optimisation process, we propose a tunnel image enhancement algorithm DNO-SCI (denoising and overexposure suppression based Self- Calibrated illumination). Firstly, based on the SCI model, a noise suppression module based on a priori knowledge is added to effectively suppress the noise of SCI after low-light enhancement. Secondly, overexposure suppression is guided through the Y channel, and finally a lightweight self-calibrated tunnel construction image enhancement algorithm is proposed in combination with depth-separable convolution. Experimental results demonstrate that the proposed image enhancement algorithm can effectively enhance tunnel construction images with uneven brightness and suppress local overexposure.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the popularization of mobile devices and the explosion of data in this era, more and more images, voice, text and other information need to use machine learning to improve the processing efficiency. In order to reduce the dimension of original processing, improve the learning rate and reduce the storage cost, researchers have studied a large number of data dimension reduction algorithms, which have made great achievements in the past hundred years, including principal component analysis, linear discriminant analysis, independent component analysis and so on. In the matrix decomposition of images, speech and text, the negative value obtained by decomposition often has no practical significance. Therefore, Lee et al. proposed non-negative matrix decomposition in 1999, adding non-negative constraints in the decomposition process to decompose the base matrix and coefficient matrix. Generally, low-dimensional base matrix is replaced by the original data matrix to learn classification. Since non-negative matrix decomposition does not produce enough sparsity in the actual operation process and its nature is unsupervised learning cannot, some researchers propose non-negative matrix decomposition for adding sparsity and matrix decomposition for adding discriminative information, which can reduce the spatial storage and learning time more effectively. Due to the slow convergence rate, some researchers propose local decomposition and two-dimensional non-negative matrix decomposition, which effectively improve the convergence rate. In order to reduce the influence of each type on the same face portrait, the product of each matrix in the same angle can be used as the type limit item to the target function. With two constraint terms, more multi-angle data. In order to obtain face data in different dimensions, this paper also combines the multi-dimensional technology with Multi-view NMF technology, designs a multi-dimensional multi-view face recognition method, and provides a more detailed implementation method. The experiment shows that compared with other techniques in the same kind, the method in the paper can better understand the multi-dimensional and multi-view data, and greatly improve the accuracy of human face.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Advanced Technological Innovation and Intelligent Monitoring
The purpose of this study is to investigate the development and evaluation of a high integrity navigation system for vehicular applications, focusing on the fusion of Global Positioning System (GPS) and Inertial Measurement Unit (IMU) data. The study compares the similarities and differences in performance between the Kalman filter (a traditional GPS/IMU integration method) and machine learning models. The experiments are based on KITTI GPS/IMU sequences, and the impact of these methods on the performance of the navigation system is evaluated by introducing different noise levels. First, a Kalman filter is used to fuse the GPS/IMU data and the estimation error of the trajectories is investigated by adjusting the noise level. Second, a machine learning model is introduced to compare its performance under different parameter configurations using random forest regression as an example. In addition, the effects of different parameters on the performance of the two methods are analyzed, which provides an important reference for choosing a suitable navigation system. The results show that the Kalman filter model basically outperforms the machine learning model in terms of mean square error (MSE) and mean absolute error (MAE) of trajectory estimation. However, the random forest regression model performs best after tuning. The paper concludes with a comparison between the Kalman filter and the random forest regression model, emphasizing the robustness and adaptability of the random forest regression model in GPS/IMU fusion. The research results provide insights for the selection and design of navigation systems.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper introduces an enhanced YOLOv5 algorithm tailored for real-world traffic sign detection applications. Through the incorporation of Coordinate Attention after the SPPF module of the YOLOv5 backbone, the YOLOv5 neck pays more attention to key areas in the image and preserves accurate positional information. In response to the fact that traffic signs are mostly small targets, the PAN and FPN in the original algorithm's neck network are upgraded by substituting BiFPN for the previous feature fusion method, improving the algorithm's ability to detect traffic signs of different sizes, specifically targeting improved detection accuracy for small-scale targets. To validate the effectiveness of these modifications, we compared our improved model with other object detection model on the TT100K dataset and conducted ablation experiments. The experiments result revealed that the enhanced algorithm achieved an mAP of 94.2%, surpassing the original YOLOv5 model by 6.9%. The detection speed was 49.2 FPS, meeting the real-time requirements.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Instance segmentation is an important task in computer vision with wide applications in autonomous driving. Instance segmentation in autonomous driving aims to associate each pixel in an image with its corresponding object instance, enabling precise segmentation and recognition of different objects. These objects can include pedestrians, vehicles, bicycles, traffic signs, and more. In instance segmentation, common methods include top-down and bottom-up approaches. The top-down approach first performs object detection to generate candidate proposals and then performs pixel-level segmentation on each proposal. It is accurate and flexible, capable of handling objects of different sizes and shapes. However, it has high computational complexity and relies on the accuracy of object detection. The bottom-up approach first performs pixel-level clustering or segmentation and then combines candidate instances to obtain the final segmentation result. It can handle overlapping instances and has lower computational complexity but may not accurately localize and segment instances and may have coarser segmentation granularity.This paper proposes a hybrid model called HISNet that leverages the advantages of both top-down and bottom-up strategies. In the prediction stage, the paper introduces an innovative dual-branch design. One branch is the bounding box aggregation branch, which generates high-dimensional information such as the shape and pose of the bounding boxes based on the FCOS Head. The other branch is the mask decoding branch, which generates mask prediction results. These two branches are fused using the Mask FCN Header. Additionally, the model adopts EfficientNet as the backbone to improve accuracy and inference efficiency, and the Neck module incorporates the MPAFPN network module to enhance feature fusion. With these improvements, HISNet achieves an improvement of approximately 1.95% to 5.0% over the baseline model on the COCO dataset. This indicates that the proposed model performs better in urban street scenes, further enhancing object detection and segmentation tasks. The HISNet model holds the potential to provide more reliable and efficient solutions for applications in autonomous driving, intelligent transportation, urban security, and other fields.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This study focuses on enhancing the YOLOv5 algorithm for real-time vehicle detection, critical for autonomous driving and surveillance. We improved its performance for box truck detection through advanced attention mechanisms, multi-scale feature fusion, and lightweight design. These enhancements led to notable increases in accuracy metrics (mAP@0.5 from 0.731 to 0.771 and mAP@0.5:0.95 from 0.537 to 0.56) and improved both precision and recall. This demonstrates the optimization's theoretical and practical effectiveness, contributing significantly to autonomous driving and intelligent transportation safety and reliability. The study offers valuable technical insights for deep learning and computer vision, guiding future advancements in vehicle detection and visual recognition.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Machine learning has emerged as a popular technique for generating realistic animations in the video game industry. This paper aims to explore the application of machine learning in creating 2D skeletal animation for body control in Unity. The goal is to develop a model that can accurately predict the movements of the character's skeleton based on input parameters. To achieve this, the approach involves collecting a dataset of motion capture data, selecting an appropriate machine learning algorithm, creating an animation environment in Unity, training the machine learning model, and testing its accuracy. Each step is described in detail, along with an evaluation of the effectiveness of the method. Overall, the paper presents a promising approach to improve the efficiency and realism of 2D skeletal animation in video games.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the rapid evolution of technology, artificial intelligence has gradually permeated every aspect of our lives, not least in the field of artistic design. In recent years, image style transfer algorithms, as a form of artificial intelligence technology, have garnered widespread attention in the realm of artistic design, particularly in the domain of packaging design. These algorithms can take the style of one piece of artwork, process it through a model, and subsequently apply it to another piece of artwork, infusing the field of packaging design with new vitality through unique creativity and visual effects. This paper will use the design of condom packaging as an example, aiming to design a condom package that conforms to the current aesthetic preferences of domestic college students under the integration of image style transfer algorithms and packaging design. At present, AIDS and sexually transmitted diseases are spreading widely among college students, and the reasonable and safe use of condoms is the most effective and direct prevention method. By improving the design of condom packaging, the usage rate of condoms among domestic college students can be increased, thereby reducing the risk of transmission of AIDS and sexually transmitted diseases. Through the style transfer algorithm, the main images, graphics, and text of the condom packaging are creatively processed and redesigned to meet the current aesthetic preferences of domestic college students and increase their purchase intention and usage frequency. The combination of condom packaging design and artificial intelligence can not only increase the usage rate of condoms in the future, thereby preventing the spread of diseases, but also enhance the practicality and market value of condom packaging, bringing more business opportunities and profits to manufacturers and sellers.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the field of industrial safety, wearing helmets plays a vital role in the protection of workers. Aiming at the complex background in the industrial environment, caused by the difference in distance, the helmet small target wear detection of misdetection, omission detection problem, an improved YOLOv8 safety helmet wearing detection network is proposed, which aims to enhance the capture of details, improve multi- scale feature processing and improve the accuracy of small target detection by introducing DWR(Dilation-wise Residual) attention module, ASPP(Atrous Spatial Pyramid Pooling) pool pyramid. Experiments are conducted on the SHWD dataset, and the results show that the mAP of the improved network is enhanced to 92.0%, and it exceeds the traditional target detection network in terms of accuracy, recall, and other key metrics, which further improves the detection of helmet wearing in complex environments, and greatly enhances the accuracy of detection.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Mouse, as a human-computer interaction device with very high frequency in daily life, is less used in ships, mainly because the ship sailing in the rivers, lakes and seas will be bumpy, especially when encountering bad weather, the bumps are very serious. The mouse, unlike trackball, cannot be fixed on the operating table and cannot be fixed and stored when it is bumpy. In order to further improve the crew's experience in the use process, this paper introduces a mouse device that can be used and stored on the ship. This paper describes in detail the design concept and process of the mouse device through the track recognition technology, photoelectric sensing processing technology, storage box fixation technology, etc., and environmental adaptability technology.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Construction site safety has become increasingly important with the rapid development of industrialisation around the world. To enhance the safety management of construction sites and protect the lives of workers, information technology tools such as video surveillance and image detection can be utilised. However, images in complex environments such as haze and dusty site scenes may be distorted, affecting the accuracy of helmet wear detection. In order to solve these problems, this paper designs and implements In this paper, we design an improved algorithm for YOLOv8 target detection model that incorporates the AOD-Net dehaze model. Firstly, the AOD-Net defogging model is used to pre-process the captured images for defogging, and by adding the ECA attention mechanism to the network, it helps the network to better capture and utilize the information of the image details during the training; Adaptive Histogram Equalization (AHE) is added to post-process the images, which effectively improves the quantitative indexes of PSNR and SSIM, and generates the dataset for training. Second, the improved YOLOv8 is used for helmet target detection, which improves the small-scale feature extraction capability by introducing deformable convolutional DCNv2; adds the AUX-head module in YOLOv7 to provide richer gradient information for the model to help the training; replaces the original up-sampling of the model with the lightweight up-sampling operator CARAFE, which increases the sensing field and reduces the model computing cost; replace the YOLOv8 loss function with SIoU to achieve faster convergence in the training phase; experimental results show that: the accuracy of the improved YOLOv8 network model reaches 0.907, compared to the original YOLOv8 model, the accuracy of the network model is improved by 3.047%, which provides higher detection accuracy and reasoning speed, and meets the requirements for the safety helmet wearing detection task under the dusty environment, without increasing the number of parameters and the cost of detection. This provides higher detection accuracy and inference speed without increasing the number of parameters and the cost, which meets the needs of helmet wear detection in dusty and foggy environment.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Advances in automation for the packaging industry necessitate robust quality control measures, particularly in food appearance inspection. This paper introduces PC-Yolo, an advanced defect detection system based on an enhanced version of the YOLOv5 algorithm, tailored for the inspection of ham sausage products. PC-Yolo incorporates Partial Convolution (PConv) to improve the defect detection process, offering substantial gains in speed and accuracy over traditional methods. A custom dataset comprising high-resolution images of ham sausages, with and without defects, was used to train and validate the model. The proposed system efficiently handles variations in defect size through optimal prior box dimensionality and employs a combination of Feature Pyramid Network (FPN) and Path Aggregation Network (PAN) for effective feature fusion. The resultant PC-Yolo model demonstrates superior real-time detection capabilities, with robust performance in complex scenarios, thereby addressing the industry's zero-defect challenge.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the rapid development of digitalization, cloud platforms have become a key technology supporting remote networking. However, the accompanying cybersecurity issues, particularly the detection of anomalous access activities, pose significant challenges in maintaining platform stability and user data security. This paper proposes a hybrid model that integrates a one-dimensional convolutional neural network (1D-CNN) with the DBSCAN clustering algorithm, aimed at enhancing the detection accuracy of anomalous access behaviors in cloud platforms. Through in-depth analysis and feature extraction of network traffic data, combined with the effective identification of abnormal patterns by the clustering algorithm, this study not only improves detection efficiency but also enhances the model's adaptability to novel anomalous behaviors. Experimental results demonstrate that this model surpasses traditional detection methods in multiple performance metrics such as accuracy, precision, recall, and F1 score, confirming its feasibility and effectiveness in practical application scenarios.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper mainly introduces a new type of intelligent water cleaning robot. The robot control system is built based on the Raspberry PI and STM32 as the operating platform, and a new communication method is designed as the remote control, which can remotely control the water robot in a short time through manual control. At the same time, the visual part of the automatic control Yolov7-tiny detection network model is lightweight modified, reducing the number of parameters and calculation amount, reducing the model size by 20.5%, and increasing the FPS on Raspberry PI by 41.79%. Through the experimental verification, the results show that the machine can effectively clean up water pollutants in both automatic mode and remote control mode, and can be an effective solution for water pollutant cleaning.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In response to the characteristics of a large number of small infrastructure project review, wide professional involvement, and strong review timeliness, this paper proposes a design scheme for a digital review system for small infrastructure projects based on BIM (Building Information Modeling) technology. It focuses on the five aspects of "human, material, machine, method, and environment" in small infrastructure projects to achieve intelligent three-dimensional review. Based on a rule engine, it realizes one click extraction of key indicators for review materials, vigorously promotes standardization and standardization of review, enhances the authority and effectiveness of review, effectively accumulates and forms digital assets for review, and realizes the efficient transformation from manual review to human-machine collaboration.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In order to meet the demand for underground air quality monitoring by the mine frequency conversion ventilation system, a mine air quality evaluation method based on improved random forest is proposed, aiming at accurately monitoring the air quality inside the mine and providing an important reference for the frequency conversion ventilation system, so as to effectively safeguard the safe production and the miners' health. Firstly, analyze the sources of pollutants in the air of mines and their potential hazards to the health of miners, and the five main pollutants, namely carbon monoxide (CO), sulfur dioxide (SO2), hydrogen sulfide (H2S), nitrogen dioxide (NO2) and dust, are selected as the evaluation factors. Secondly, the standard for evaluating underground air quality was established, based on which an air quality evaluation rating system was set up and the corresponding data set was constructed accordingly. In this study, an improved random forest algorithm using AUC values is used to carry out a comprehensive evaluation of mine air quality. The experimental results show that the improved algorithm performs better than the original algorithm, with a minimum generalization error of only 0.0177 and a maximum classification accuracy of 97.72% for the test data. The method can better achieve the evaluation of air quality under the mine, with high robustness and stability. It provides new ideas and methods for the construction of smart mines and the evaluation of air quality in mines.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
To address the problem with inadequate precision for motor rolling bearing faults identification, a method for diagnosing faults of motor bearing based on MFCC-1D-CNN was presented. The method combines the advantages of Mel-scale frequency cepstral coefficients (MFCC), which has a strong ability to extract features of spectral energy distribution, and one-dimensional convolutional neural network (1D⁃CNN), which has more lightweight structure. First, the acceleration signal from the acceleration sensor placed at the motor bearing was acquired. Then, the MFCC features were extracted from the acceleration signal. Finally, the features of MFCC were flattened and input into the 1D-CNN model, which can quickly recognize faults of motor bearing. Results of experiment express that the accuracy of this method for diagnosing the bearing faults of the motor can reach 99.94 %.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Aiming at the problem that rolling bearing fault signals are easily interfered by strong background noise, we propose a method that combines the parameter-optimized Variational Mode Decomposition (VMD) and implementation of Maximum Correlated Kurtosis Deconvolution (MCKD) is used for extracting rolling bearing fault features. (MCKD) combined method for extracting features of faults in rolling bearings. Firstly, PSO is applied to optimize the parameters within VMD and select the optimal modal components; then, the parameters in MCKD are optimized using PSO and MCKD is used to strengthen the fault impact components in the optimal component signals; the envelope spectrum is eventually employed for identifying the eigenfrequencies of bearing faults. Experiments show that the proposed method can adaptively enhance the shock component in rolling bearing faults, and can effectively extract characteristics of rolling bearing faults under significant amount of background noise.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With ongoing research and development in deep learning technologies, an increasing array of deep learning techniques are being applied in single image and video dehazing. This paper argues that it is essential to achieve faster performance while maintaining frame coherence and overall accuracy. Therefore, this paper introduces a new video dehazing network. This network first constructs a Wavelet U-Net to learn the image’s periphery features and use Wavelet part to trains the network. Subsequently, this paper incorporates the MPG and MSR modules into the network. The MPG module ensures coherence in dehazing effects, while the MSR module overall makes the restored images visually closer to reality. Our neural network, combining these three parts, performs better than most methods and is one of the most suitable methods for certain scenarios.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The capacitor voltage transformer (CVT) is a measuring device that converts high voltage into a low voltage signal. It is better than the electromagnetic voltage transformer (PT) in terms of economy and protection against interference, so it is widely used in stations and substations of 35kV and above. Currently, CVT are still mainly inspected by the shutdown inspection method on a four-year cycle, which cannot meet the demand for intelligent monitoring of key equipment in smart substations. Considering the above problems, this paper investigates a device based on the deep learning prediction algorithm to realize intelligent in-line CVT prediction, which has good performance in terms of functionality and reliability.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This article aims to design a three kinds real-time binocular vision detection system based on the ppdet target detection large model of the Paddlepaddle framework to improve the recognition capabilities and efficiency of smart classroom education robots. Firstly, an in-depth study was conducted on the theories related to binocular stereo vision, including camera models, binocular ranging principles, binocular camera calibration stereo correction, etc. Then it introduces the selection of hardware platform, binocular vision system image preprocessing, FPGA acceleration of stereo matching algorithm, and real-time comparison of stereo vision implementation. Next, the expert experience was used to label the face status of teachers and students in the smart classroom into three categories, and the processed data was trained based on the ppdet target detection large model, and the model was tested. Finally, experimental verification was conducted through a real-life application scenario of a binocular patrol robot. Experimental results show that the binocular vision system module of the smart classroom inspection robot can effectively reshape visual data to improve target recognition accuracy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
To address the issue of detecting electric vehicle helmet wearing in complex environments where pedestrians and electric vehicles are moving in the same direction, we propose an improved algorithm based on YOLOv5s. To guarantee a lightweight model that enhances accuracy, we replaced the backbone structure of YOLO with FasterNet-T1 and introduced the DSConv module in the neck network of YOLO to reduce model computation without sacrificing accuracy. The experimental results demonstrate that the improved algorithm enhances the mean average precision (mAP) by 3.1% and reduces computation by 7.3% compared to the YOLOv5s algorithm. This improvement ensures higher detection accuracy while reducing computation, making it valuable for certain applications. Additionally, the improved model enhances the generalisation of detection compared to other mainstream detection models, making it applicable to a wider range of detection scenarios.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the context of the embedded system for water-surface garbage collection devices, the existing model is insufficiently lightweight, resulting in suboptimal real-time detection of water-surface garbage. This study introduces a V5-MBCE algorithm specifically designed for water-surface garbage recognition. By substituting the YOLOv5s backbone network with the MobileNetV3 network, the model achieves a lightweight improvement. Moreover, the activation function within the MobileNetV3 network is modified to GELU, thereby enhancing the robustness of model training. To augment the feature fusion capacity of water-surface garbage at various scales, the feature fusion network is altered to incorporate a BiFPN structure. Additionally, the inclusion of the CBAM attention mechanism bolsters the model’s focus on detecting targets. The loss function has been refined to EIoU, allowing for more precise border positioning of the prediction box. Experimental results demonstrate that the proposed algorithm attains a detection accuracy of 94.4%,representing a 1.2 percentage point increase compared to the original model. Furthermore, the model size is reduced to 7.1MB, approximately 54% smaller than the original model. Finally, combining the v5-MEBC model and binocular ranging algorithm, the detection and collection of water-surface garbage can be realized in the collection device.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The stable closure of the gate affects the safety performance and normal operation of the gate, and it is necessary to detect the gap distance of the closed door, but the direct observation of the gate monitoring image by the naked eye alone cannot achieve high precision and long-term accuracy.In view of the shortcomings of the existing technology, this paper proposes a gate gap detection method based on double checkerboard calibration, which can accurately detect the distance between the two gates and improve the detection accuracy.Compared with traditional methods, the method in this paper can reduce the dependence on gate and camera installation positions, thereby reducing maintenance costs.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The accurate detection of defects in aircraft rivets plays a crucial role in ensuring the safety of the aircraft. At present, the inspection of dense and diverse aircraft rivets defects mainly relies on manual completion, which seriously affects the efficiency of aircraft parts production. To solve this problem, the YOLOv5s-DMSA model is proposed in this paper. Based on YOLOv5s, it has made the following improvements: (1) The DMSA model is proposed in the backbone to increase the receptor field and facilitate multi-scale cross-channel extraction of more comprehensive feature information. (2) A tiny target detection head is added to the detection head, which is specially used to detect tiny targets such as rivets. The experimental results show that the mAP value of the proposed YOLOv5s-DMSA detection method is 7.7 percentage points higher than that of the standard YOLOv5s model.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The detection of surface defects in steel is a crucial step in ensuring the quality of the steel. Traditional defect detection methods suffer from low accuracy. The complex shapes of steel surface defects and generally small target areas significantly affect the accuracy of steel surface defect detection. To address these issues, this paper proposes an improved steel surface defect detection algorithm based on a target detection model. This is achieved by incorporating a lightweight backbone network, a Prior Attention Mechanism module (CPCA), and redesigning the Neck module to enhance the accuracy of steel defect detection. Finally, the proposed defect detection algorithm is validated using the publicly available steel surface defect detection dataset NEU-DET. Experimental results show that the network model proposed in this paper has good detection accuracy, with an average precision of 79.1%, which is a 5% improvement over the original algorithm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Drone image recognition plays a significant role in identifying forest fires. This paper applies convolutional neural networks and explores the effects of category weight setting and attention mechanisms, aiming to find a more accurate model for recognizing forest fires. The paper first compares the performance of DenseNet121, InceptionV3, MobileNetV2, and ResNet50, finding that MobileNetV2 performs exceptionally well. Then, based on MobileNetV2, parameter tuning is carried out, determining the appropriate optimizer Adam, learning rate 0.001, and random seed 11. Subsequently, the paper explores category weight setting and attention mechanisms, ultimately finding that category weight setting has a significant effect, while the role of attention mechanisms is limited under the circumstances of this paper. Begin the abstract two lines below author names and addresses.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This research provides an enhanced YOLOv5s tunnel workers identification algorithm to address the challenge of complicated and challenging worker distribution in tunnel environments. The network's ability to extract features is improved by incorporating a tiny target detection layer and the Squeeze-and-Excitation (SE) attention mechanism network. The introduction of depthwise separable convolution helps to prevent having a lot of model parameters. In order to decrease the missing likelihood of workers detection, the Soft-NMS algorithm is implemented with the aim of overlapping workers' target features in certain photos taken during tunnel construction. The experimental results demonstrate that the tunnel worker detection method proposed in this paper can successfully detect workers in the tunnel construction environment, with the precision rate of the improved detection model increasing from 90.06% to 94.03%, the recall rate increasing from 88.28% to 92.18%, and the average precision increasing from 83.95% to 86.91%. It is of vital significance to the life safety of tunnel workers.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.