PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This PDF file contains the front matter associated with SPIE Proceedings Volume 13486, including the Title Page, Copyright information, Table of Contents, and Conference Committee information.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Aiming at the difficulty that light soft shadow rendering in rectangular areas cannot meet the physical accuracy in real-time rendering, an approximate rectangular area light model rendering algorithm is proposed. The algorithm first generates shadow maps to record depth information from the perspective of the light source to render the scene, and then proposes a model that considers the relationship between the spatial position of the occlusion object and the shape of the light in the rectangular area. Calculate the width of the light in each direction of the rectangular area, the average depth around the texel, and the distance from the occluded object to the receiving object. Determine the spatial filter width of the shadow map corresponding to the visual field attractions, and finally use this width to obtain high-realism soft shadows in the shadow map. Experimental results show that this method is more effective than related algorithms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In order to solve the problems of information loss in image fusion, an infrared and visible image fusion method based on fractional-order differentiation is proposed. Firstly, the multi-scale transform is used to decompose the source images into low frequency and high frequency subbands, and the low frequency subbands are further decomposed into low frequency basic subbands and low frequency detail subbands by two-scale decomposition. Secondly, for the low frequency base subbands, the weighted sum of energy ratio and standard deviation ratio is used to construct the judgment value which is used to fuse low frequency base subbands. For low frequency detail subbands and high frequency subbands, the fractional-order differentiation is introduced, and the fusion rule of maximum fractional-order sum of modified laplacian is adopted. Finally, the fused low-frequency basic subband and low-frequency detail subband are transformed by two-scale inverse transformation to obtain the fused low-frequency subband. the multi-scale inverse transformation is performed to the fused low frequency subband and high frequency subbands to obtain the fused image. Three groups of infrared and visible images are selected to verify the effectiveness of the proposed algorithm. From the subjective assessments, the proposed method highlights the infrared target well, retains the details of the visible image and texture details, and achieves a good visual effect. From the objective assessments, the entropy, standard deviation, spatial frequency and mean gradient of the fusion method in this paper are higher than the other five methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The task of multiple object tracking from the perspective of Unmanned Aerial Vehicle (UAV) is becoming increasingly important and has a wide range of applications. However, conventional multiple object trackers do not fully exploit temporal and spatial information, facing challenges such as target blurring and variable trajectories due to the high-speed motion of UAV. In this paper, we propose STAF(Spatiotemporal Attention Fusion Network), which is based on spatiotemporal multi-head attention and fully integrates information from video sequence frames, enhancing the detection capability of targets. To better handle the camera shake, we develop an appearance feature update algorithm based confidence. The proposed method has demonstrated improvements on the VisDrone2019 dataset.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Traditional methods for tomato fruit recognition and detection face challenges in natural environments, including low recognition accuracy and slow processing speeds. To achieve automated and accurate recognition of tomato fruits in complex environments, thereby facilitating automatic fruit picking, this study proposes a method based on an enhanced YOLOv7 model for tomato fruit recognition. This approach modifies the original YOLOv7 model by integrating SEnet structures into the ELAN module within the backbone network, allowing for the assessment of the importance of different channel features. Additionally, a global attention GAM module is incorporated at the Neck end of the model to improve the network's feature representation capabilities. Experimental validation demonstrates that the improved YOLOv7 model performs satisfactorily on the test set, achieving a mean accuracy of 98.9%. Compared to the original YOLOv7 model, the enhanced version exhibits a mean average precision increase of 6.2 percentage points. The comprehensive results indicate that the improved YOLOv7 model offers robust technical support and insights for the realization of automated tomato fruit picking.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the field of multi-label image classification, accurately identifying multiple relevant labels in an image is crucial for applications such as image understanding, automatic annotation, and intelligent search. However, traditional classification methods are often difficult to handle both the complex semantics and fine details of images. To this end, a method combining semantic information and up-sampling techniques is proposed. On the one hand, the graph attention network (GAT) is combined with the differentiable graph pooling module (DiffPool) to enhance the model's multi-faceted understanding of the semantic information of the image, and on the other hand, the lightweight up-sampling operator CARAFE is introduced to enhance the model's understanding of the details of the image. The method effectively improves the classification accuracy of multi-labeled images by fully utilizing the deep semantic features of the images and recovering the image details using the up-sampling technique.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
One of the surgical devices used to treat cervical spine disorders. The atlantoaxial lateral block is fusion device. Atlantoaxial joint space reconstruction is one of the key steps in the use of the atlantoaxial lateral block fusion device, whereas the conventional 3D atlantoaxial joint space reconstruction suffered from low reconstruction precision and accuracy, as well as the inability to take into account its dynamic properties accurately. To address these issues, this work proposes a parallel segmentation reconstruction model. By using the patient's cervical spine CT datas as input, the atlantoaxial joint gap is reconstructed in 3D by the gap edge detection module and 3D reconstruction module of the model in this paper, and the visualized 3D model is output. In the gap edge detection module, an advanced image segmentation algorithm based on Cxy-Net is adopted to optimize and extract the details of the gap. The average Hausdorff distance (Hd) of this model is 10.5211 mm, the average symmetric surface distance (ASD) is 0.3861 mm, the average surface overlap (So) reaches 90.09%, the average Dice similary coefficient (Dice) is 0.8834, and the average accuracy (AC) is 0.8914. Compared with the conventional modeling, the model of the present paper improves the accuracy, Dice similary coefficient, and accuracy by about 15.37%, 8.96%, and 4.84% respectively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
EMG signals are the electrical activity generated by muscles during contraction and relaxation and contain a wealth of neuromuscular information, and the use of EMG signals for classification is a critical step in the utilization of robotics to achieve rehabilitation goals. Support Vector Machines (SVM) are a powerful machine learning algorithm, but the classification performance of Support Vector Machines is strongly influenced by the choice of its own parameters. In this paper, we propose a pattern recognition method based on the optimization of SVM parameters by the Folded Lizard Algorithm (FLO) and compare the finger movement pattern recognition rate with the models obtained by SVM, Genetic Algorithm (GA) optimized SVM parameters and Sparrow Search Algorithm (SSA) optimized SVM parameters, and validate the proposed method using the NinaPro DB1 dataset. The experimental results show that the recognition rate of the FLO-optimized SVM model is 11.67% higher than that of the unoptimized SVM model, and 2.78% higher than that of the SSA-SVM model, which verifies that the proposed method can improve the accuracy of classification.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Unmanned aerial vehicle (UAV) image localization can be used in global navigation satellite system (GNSS)-denied environments for UAV self-navigation. This study proposes a fast localization method for UAV images in GNSS-denied environments based on image sequence relationships. First, the LightGlue network extracts features from adjacent UAV images and combines these features with the sequence relationship of the UAV images to perform feature matching within the overlapping range. After eliminating errors, the affine transformation matrix is calculated to realize adjacent image positioning of the UAV. Experiments conducted on multiple image datasets show that the proposed method successfully completes the rapid positioning of UAV images in GNSS-denied environments, with positioning errors less than 0.5 m, indicating potential for practical applications
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Emotion recognition with EEG signals is one of hot topics in the fields of human-computer interaction and affective computing. Due to the inherent nonlinear and nonstationary nature of EEG signals, most existing emotion recognition models struggle to effectively capture their intricate time-frequency features, however, it results in daunting challenges for feature extraction and affects recognition accuracy. To address the issue of emotion recognition, this paper proposes a novel DFrft-ES model based on fractional Fourier transform. In the model, the EEG signals are preprocessed by decomposing them into five frequency bands. Moreover, by use of the DFrft-EEG method, fractional Fourier transforms of different orders are applied to each frequency band to obtain fractional domain signals of various orders. With the fractional domain signals, the features such as PSD, DE, DASM, RASM and DCAU are extracted to form the emotion feature vectors. To validate the extracted features on the multimodal emotion database DEAP dataset for emotion recognition, an SVM classifier is designed and trained. Subsequently, the optimal order and best features are selected based on the trained results. The experimental results show that using the DE features extracted with the DFrft-EEG method at the 0.2 order yields the highest classification accuracy of 95.63% for the four emotional regions HVHA, LVHA, LVLA, and HVLA on the arousal-valence plane. This demonstrates that the proposed method has good robustness in performing emotion classification tasks and can effectively improve emotion recognition accuracy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Traffic sign recognition and classification is among the most significant technologies used in autonomous driving. In view of the fact that the identification accuracy of traffic signs is insufficient owing to multi-scale allocation of traffic signs under a variety of diverse meteorological circumstances, an improvement method of traffic signs based on Yolo model was proposed. Firstly, the first 16 traffic signs were selected according to the quantity of images for every type of traffic sign in LISA dataset. Considering the large requirements for the real-time recognition of traffic signs in unmanned driving, the Yolo v5 model with both efficiency and accuracy in the Yolo series model was used to recognize the image, and on this basis, the C3 module in the model was modified to the C2f module to enhance the integration of traffic sign features and the network's capability to express these features in complex environments, and the EIoU loss function was employed to refine results, and it was found that the mAP and other indicators were improved as well as recognition accuracy was higher. In addition, the paper also compares the previous work utilizing the LISA dataset and reveals enhancements in the accuracy, recall, and F1-score.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In general, natural data are long-tail distributed over semantic classes. Existing recognition methods tackle the imbalanced classification problem by designing architectures that focus more on the tail classes without altering the imbalance in the original dataset. In this paper, we propose an augmentation learning strategy, which not only alleviates the bias of the classifier towards head classes but also generates a balanced augmented dataset. Our method is called Diffusion-Augmented Learning (DA), which can be easily applied to any long-tail recognition model. Its core idea is to leverage the generative capabilities of diffusion models to generate a balanced augmented dataset with high confidence and rich feature information. Subsequently, through a specific sampling strategy, we mix the original dataset with the augmented dataset to tackle long-tail recognition. The purpose of our method is twofold: 1) to utilize the generative capability of diffusion models to add sufficient and reliable augmented data for training, which improves the robustness of the model, and 2). to introduce more augmented data for tail classes through both probabilistically replacing strategy and an improved diffusion model, which alleviates the model's bias towards head classes. We carried out a series of contrast experiments on the CIFAR10-LT, CIFAR100-LT, and ImageNet-LT datasets, and the experimental results demonstrate the superiority of our proposed method and that the augmented dataset is versatile.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Apple scab, caused by the pathogen Venturia inaequalis, is one of the most prevalent diseases affecting apple trees, significantly reducing both yield and fruit quality. Early detection and diagnosis are critical for the effective management of this disease, which is typically characterized by the appearance of black lesions on the leaves and fruit. In this study, we developed a deep learning-based system for classifying apple scab using Convolutional Neural Networks (CNN) and transfer learning. The system was trained on a dataset of apple leaf images, employing various data augmentation techniques, such as rotation, flipping, and scaling, to improve the model's robustness. Our proposed model achieved high accuracy in distinguishing between healthy leaves and those affected by apple scab, making it a promising tool for precision agriculture and automated disease monitoring. This research offers a potential solution for reducing dependence on manual labor and enhancing early intervention practices in apple orchards.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Edge computing, which offloads data processing and analysis tasks to the edge of network, has gradually become a critical technology for handling large-scale data and providing low-latency services. Container technology is a lightweight, portable way of packaging software that is well suited to edge computing environments because it simplifies the deployment and management of applications while reducing the dependence on specific hardware. The synergistic use of container and edge computing has emerged as an ideal approach for achieving efficient and reliable service delivery. When a new container instance is deployed, the corresponding image must be downloaded from a remote image registry. However, edge nodes typically have limited storage resources, which means that storing a large number of images is not feasible. In edge environments with limited network resources, downloading the required images significantly increases container startup time, affecting the quality of service. Current research often focuses on container startup and image distribution, but frequently overlooks the management of storage space for images at edge nodes. We propose Image Cleaning method based on Minimum Layer Affinity (ICMLA), aiming to innovate strategies for managing images storage space in the edge computing environments. To evaluate the efficacy of the proposed method in this paper, a simulated cloud-edge cluster, designed to emulate a real-world operational setting, was established, and a series of comparative experiments were executed to assess performance. Experimental results show that, compared with the typical image replacement algorithms, ICMLA can reduce the total Pods startup time by 13.31% and increase the hit count by 56.67% at most.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
To fully harness the capabilities of computer graphics and image processing technologies and elevate the quality of visual communication design, this paper presents a comprehensive suite of innovative methodologies. Firstly, the fundamental principles of primary colors are utilized to formulate a series of vector functions, facilitating the precise calculation of error distances and the establishment of an intricate visual communication zoning model. Subsequently, through the exhaustive extraction of pertinent feature information from graphics and images, we achieve the scientific encoding of coefficient-constrained features and determine the corresponding hyperplanes to acquire the relevant parameters for visual communication. Eventually, the implementation of advanced design image expression algorithms markedly enhances the effectiveness of visual communication image representation. The empirical findings reveal that these visual communication design methods exhibit remarkable reliability and feasibility, effectively boosting image recognition accuracy across four distinct image categories with varying levels of complexity, thereby yielding exceptional visual communication outcomes.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Aiming at the problem that most of the current multi-view clustering methods only focus on the information of a single view, ignoring the correlation between views, and failing to fully explore the potential structure of the data, a multi-view clustering method combining dynamic fusion and non-negative matrix factorization is proposed. First, we extract low-dimensional feature representations for each view using non-negative matrix factorization and enhance feature consistency across different views through contrastive learning. Second, we apply a continual learning strategy to dynamically update the model, adapting to changes in new view data and improving model robustness and adaptability. Finally, we use a dynamic fusion strategy to weight and integrate features from all views, obtaining a common feature representation for multi-view data. Experimental results demonstrate that our proposed method significantly outperforms existing methods in clustering performance on several public datasets.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Brain-computer interface(BCI) is a method of extracting EEG signals by specific means and decoding them using signal processing algorithms to help people with motor disabilities to interact with the outside world through external devices. In order to improve the EEG signal pattern recognition rate of motor imagery. A pattern recognition method of neural network feature fusion combining convolutional neural network (CNN) and recurrent neural network (RNN) serial connections is proposed, and two different RNNs are used for experimental comparison. The proposed method is validated using the BCI Competition IV 2a dataset, and the experimental results show that the proposed method can effectively improve the multi-classification accuracy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Existing MOOC review sentiment classification methods do not fully utilize the local context information associated with aspect, and they ignore the connection between local and global contexts, while resulting in modeled features that lack the information connection between aspect and contexts. In this paper, we propose a model that incorporates Local Context Focus (LCF) and Bi-Directional Gated Recurrent Unit (Bi-GRU). First, the BERT model is used to dynamically encode course reviews. Then, global semantic features are extracted using the Bi-GRU model to strengthen the connection between the preceding and following texts. Then, the LCF model based on multi-head self-attention is used to obtain local contextual features and splice them with global semantic features. Finally, the Softmax function is utilized to output the classification results. The experimental accuracies of the proposed model on the three MOOC course review datasets reach 97.96%, 96.76%, and 94.16%, respectively, which are improved by 0.70%, 0.42%, and 0.03% over the suboptimal baseline model. The proposed model significantly improves the effectiveness of MOOC course review sentiment classification, and provides a useful reference for the optimization and improvement of MOOC courses.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
To address the decreased robustness and accuracy of traditional visual SLAM systems in dynamic environments, this paper proposes a visual SLAM method that integrates FastestDet object detection to enhance performance. First, dynamic objects are detected and classified by FastestDet. A depth map-based clustering algorithm is then used to segment the static background from highly dynamic objects. Epipolar geometry constraints are applied to eliminate dynamic points from low-dynamic objects, retaining only static points for pose estimation, thus improving accuracy. Validation on the TUM dataset shows that, compared to ORB-SLAM3, the proposed method reduces the root mean square error of absolute trajectory error by an average of 94.55% in highly dynamic sequences, with notable improvements in low-dynamic sequences as well. These results demonstrate that the proposed method significantly enhances both robustness and accuracy in visual SLAM under dynamic environments.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Artificial Intelligence (AI) is increasingly used in cognitive health assessments, with the Clock Drawing Test (CDT) being an effective cognitive evaluation tool. However, the complexity of CDT image structures, high subjectivity, and the lack of specialized cognitive health assessment datasets for specific populations pose significant challenges for feature learning and model construction using this method. To address these issues, we propose a fine-grained multi-task learning approach (MLCDT) for AI-assisted diagnosis of cognitive health using CDT. MLCDT integrates image pre-training models with a multi-task learning framework to capture fine-grained features of CDT images and constructs a final diagnostic support model through scientifically designed tasks. Experiments using real data from cognitive health assessments in a neurology department at a hospital validate the effectiveness of MLCDT in handling fine-grained tasks and aiding cognitive disorder assessments.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Recently, facial attribute transfer has become an important research direction in the fields of image processing, but how to improve the quality of the final synthesized faces has always been a difficult point for continuous improvement. Therefore, an improved face attribute transfer model based on generative adversarial networks (GAN) is proposed in this paper. The PatchGAN structure was introduced to train the network model, and the discriminator structure was entirely composed of convolutional neural networks, encouraging the model to give greater emphasis to the details of the generated images. The experimental results show that the proposed improvement method significantly improves the details of generated images and enhances the quality of generated images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Automatic electrocardiogram (ECG) recognition is an important topic in medical diagnosis. However, traditional deep learning models usually require large computational resources when requiring high-precision recognition, which is not suitable for resource-constrained edge devices. To tackle this challenge, a binarized neural network (BNN) based lightweight ECG recognition method is proposed in this paper, which effectively reduces the computational costs by binarizing the network weights and introducing a scaling factor to simulate a full-precision model. But when the model parameters are binarized, slight numerical changes near the threshold may cause huge output fluctuations, so an improved batch normalization is proposed, which increases the stability of the model by adding bias to the sample features. During model training, we selected Adam as the optimizer, which has faster convergence speed and model accuracy compared to the original SGD. We conducted comprehensive experimental analysis including ECG image and ECG sequence recognition on the MIT-BIH arrhythmia dataset. The experimental results show that the proposed BNN model maintains high recognition accuracy while increasing model efficiency compared to traditional floating-point neural networks. The method provides an effective solution for low-power, real-time ECG recognition for edge devices.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the field of computer vision and image processing, texture provides critical visual clues about the composition of internal regions of an image. This paper proposes a novel texture image classification method using persistent homology (PH) theory from topological data analysis (TDA) to extract topological information of texture images at different scales and dimensions. We convert texture images into grayscale images and use their pixel intensity values as filtration thresholds, tracking changes in 0-dimensional and 1-dimensional persistent homology classes under different thresholds to generate persistence diagrams of the texture images. The persistence images are then converted into feature vectors. Our method is validated with 10-fold cross-validation using a random forest model on the Kylberg dataset and KTH-TIPS dataset, achieving accuracies of 99.84% and 87.90%, respectively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
To deal with low-texture scenes, many visual simultaneous localization and mapping (SLAM) methods have introduced line features and plane features to provide additional structural information for stable frame tracking. These methods typically use a certain type of feature or feature combination, such as point-line features or point-line-plane features, throughout the entire camera pose estimation process. However simply increasing the number of feature types will introduce more noise sources, reducing the accuracy of camera pose estimation, thereby decreasing the localization accuracy. To solve this problem, this paper proposes an RGB-D visual odometry method that can automatically select different feature modes. It adaptively selects the type of features for frame-to-frame tracking based on the numbers of features extracted from current frames. The proposed method is evaluated on the TUM-RGBD dataset and achieves better trajectory accuracy than other algorithms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposes a depth map estimation method based on Deformable Convolutional Neural Networks (DCNNs). Traditional single image depth estimation methods often struggle to accurately capture irregular shapes and details in complex scenes, resulting in suboptimal spatial resolution and object boundary reconstruction. To address these challenges, we introduce deformable convolution modules that enable convolutional kernels to adaptively adjust their sampling positions, thereby enhancing the model's capability to handle complex geometric structures and dynamic deformations. Deformable convolution features high adaptability, superior handling of complex scenes, and enhanced boundary clarity. The position offsets of convolutional kernels are adaptively adjusted through learning, allowing the model to flexibly handle features of various shapes and scales, thus excelling in capturing irregular shapes and complex details. In scenes with abundant details and deformations, deformable convolution can extract features more accurately, significantly improving depth estimation accuracy. Additionally, by finely adjusting sampling points, deformable convolution effectively reduces blurring and distortion at depth map boundaries, excelling in reconstructing small objects and complex edges. Experimental results demonstrate that the proposed method significantly outperforms existing approaches in terms of depth estimation accuracy and boundary clarity, particularly excelling in complex scenes and small object reconstruction. Our research highlights the extensive application potential of deformable convolution in depth map estimation, providing robust support for future computer vision tasks.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Powdery mildew is an important factor affecting wheat yield and global food security, and precise grading of wheat powdery mildew images is an important means of computer-aided precision control. This article is based on the improved Swin Unet method for segmenting images of wheat powdery mildew and calculating the areas of wheat leaf lesions and healthy regions; By using the principle of pixel statistics to calculate the ratio of lesion area to leaf area, and utilizing the segmentation results of lesions for pixel statistics, the severity of wheat powdery mildew can be accurately evaluated. This research method can accurately segment wheat powdery mildew in complex backgrounds, providing scientific basis for automatic grading and early prevention of wheat powdery mildew in complex field environments.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In recent years, Graph Convolutional Networks (GCNs) have gained attention in the field of action recognition. However, existing methods can only extract simple spatiotemporal features of individual joints and fail to capture comprehensive spatiotemporal information of the entire human body, with limitations in modeling short-term spatiotemporal information. To address these issues, this paper proposes a Graph Convolutional Network method with short-term spatiotemporal information fusion and attention. This method learns temporal features through a short-term spatiotemporal feature fusion module, enhances the temporal representation of action features by combining human spatiotemporal information, and improves spatial skeleton information through keypoint attention modeling. Finally, multi-scale temporal convolution is used for long-term information exchange, and fusion of four-stream scores is employed for classification prediction. Experimental results demonstrate that this method outperforms existing approaches on the NTU RGB+D and NTU RGB+D120 datasets.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Pedestrian re-identification has witnessed rapid development in recent years, with particular attention given to the challenges posed by complex conditions, including pedestrian re-identification in low-light environments, such as nighttime scenarios. Through our experiments, it was discovered that even with image enhancement in low-light conditions, satisfactory results were not achieved. To address this challenge, this paper introduces a Multi-scale Feature Attention Extraction (MFAE) network.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the rapid development of technology and the widespread application of deep learning, computer vision technology has become an important tool to help visually impaired people. Traditional visual assistance systems are often limited by recognition accuracy and real-time performance, while deep learning technology provides new ideas for the development of visual assistance applications with its powerful feature extraction and learning capabilities. This application is based on existing deep learning models and deploys object detection algorithms to mobile devices for user convenience by testing the system in actual environments. The intelligent voice interaction function provided by this application enhances the user experience and interaction sense. The application features are relatively complete, not only allowing users to select images from the album but also providing camera photography as input, as well as real-time content detection and output. At the same time, the application provides voice input and output, greatly simplifying the difficulty of user operation, and also facilitating users to obtain output and make corresponding judgments faster. This application provides a new visual aid tool for visually impaired individuals, making their lives more convenient.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper explores a method for monitoring and identifying forest fires using drone remote sensing technology combined with the U-Net model and its improved attention mechanism. Traditional forest fire prevention measures are costly and have limited coverage, whereas drone technology offers an efficient and flexible solution for early detection of forest fires. In this study, the IEEE flame dataset was used to train and validate the improved U-Net model through high-resolution, multi-angle drone image data. Experimental results indicate that the U-Net model, augmented with channel attention and spatial attention mechanisms, performs excellently in complex backgrounds and high-resolution images, significantly enhancing the accuracy of fire area recognition and segmentation. The findings demonstrate that this method has notable advantages in identifying fire locations and accurately assessing fire conditions, providing robust technical support for the early warning and prevention of forest fires.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
As one of the largest and most comprehensive robotics competitions both domestically and internationally, the RoboCup competition has attracted an increasing number of university students to participate. The core of vision systems for soccer robots lies in semantic segmentation. In recent years, with the rapid development of deep learning techniques, image segmentation methods based on Convolutional Neural Networks (CNNs) have made significant progress. The U-Net model, featuring a contracting path, bottleneck, and expansive path, has been used to achieve precise segmentation and has demonstrated good performance in image segmentation tasks. However, for images with complex backgrounds and diverse targets, traditional U-Net models often struggle to achieve ideal segmentation results. This paper proposes an image segmentation model based on the ResNet101 and U-Net architecture to address the visual semantic challenges faced by soccer robots. Through a comparison of multiple models and optimizers, ResNet101 is ultimately selected as the feature extractor, and Nadam is chosen as the optimizer. This approach combines the encoder-decoder structure of U-Net with skip connections to capture multi-scale feature information in images. To further enhance the model's performance, data augmentation is incorporated, and an attention mechanism is introduced into specific layers of ResNet101, achieving significant improvements in target segmentation accuracy and generalization capability.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The delineation of bone tumor boundaries is a critical issue in the field of medical image segmentation due to the unique positioning of these tumors and the complexity of the associated surgical procedures. In recent years, researchers have made significant strides in determining tumor boundaries through computer version methods. Doctors can achieve precise localization of bone tumors using the U-Net network or the improvements. However, using tumor boundary segmentation models based on single-modal images makes it challenging to capture the complete characteristics of the tumor. In this paper, we propose a multi-modal image fusion network to achieve more accurate segmentation of tumor boundaries. Experimental results show that the accuracy of the segmentation results can be improved by about 6.8% when using fused images for segmentation. Therefore, this image processing process is of great significance for improving the accuracy of bone tumor boundary demarcation and the efficiency of clinical diagnosis.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The reliability of bolted connections is directly related to the safe operation of transmission lines. Therefore, this article proposes a tower bolt looseness identification method based on laser vibration measurement technology and time inversion. Calculate the bolt tightening force coefficient under different usage conditions to obtain the maximum static friction force between angle steels; Analyze the changes in tension, calculate the average and standard deviation of bolt tensile stress; Under laser vibration measurement technology and time inversion, stress distribution was obtained. Assuming the Poisson's ratio and elastic modulus of the power tower bolts, analyze the relationship between guided wave propagation speed and axial force coefficient, construct a nut locking structure, obtain the axial force coefficient of the power tower bolts, and complete the identification of power tower bolt looseness. The experimental results show that the accuracy of using this method to identify loose bolt samples is close to 100%. This method can provide more powerful guarantees for the safe operation of power towers.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
3D point clouds acquired by using advanced scanning equipment have been used in a variety of applications like automated vehicles, 3D modeling, and cultural heritage preservation. Due to limitations of the devices, point cloud data is often contaminated with noise. This paper focuses on denoising point clouds for downstream tasks. We adopt a self-attention mechanism to enhance feature extraction in the encoding phase. A layer normalization method is proposed to address the issue of gradient explosion occurring during the iterative process of point clouds denoising. Experimental results demonstrate that our method can diminish the error values of CD and P2M compared with the previous methods. Visualization results demonstrate that our method notably enhances the smoothness of the denoised point clouds.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Existing image captioning algorithms primarily rely on visual features and generated partial descriptions to predict subsequent words. However, these visual features often lack contextual or detailed object information, leading to captions that may not accurately describe the content of the image. To address these issues, we propose an image captioning approach that integrates low-level grid features, segmentation features, and high-level fusion features. This method supplements visual information with segmentation features, incorporating them alongside grid features in the encoder. To enhance the model's ability to capture visual context, we introduce memory-augmented attention within the existing IILN module. In the decoding phase, we leverage simultaneous utilization of low-level grid features, segmentation features, and high-level visual semantic fusion features to comprehensively learn multi-level representations of image regions and semantic relationships. Experimental results demonstrate that our approach generates more precise descriptions, achieving a competitive CIDEr score of 136.5 on the MS COCO "Karpathy" offline test split.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In high-density crowd, a unique visual motion effect called stop-and-go wave occurs, which could evolve to trampling and compression incidents. However, few computational models have been reported for stop-and-go wave perception form the angle of computer vision. On the bases of the neural structures of the locusts’ vision systems and the frequency resonance properties of the interneuronal (INs) neurons in locust’s brains, this paper investigates a bio-inspired visual neural network (sgWPNN) for stop-and-go wave perception in high-density crowd scenes. The proposed sgWPNN firstly visually stabilizes the image frames; and then converts the spatial intensity visual information to the frequency domain for extracting crowd’s motion cues; finally, sgWPNN output membrane potential impulses to identify the occurring of stop-and-go wave in the field of view. Videos filmed high-density crowd activities in two scenarios demonstrate the effectiveness of sgWPNN in the perception against to the unique stop-and-go wave effect in moving crowd scenes. This work discusses the biological inspired processing of dynamic visual information in crowd activity perception, which can provide some new ideas for crowd activity detection and behavior analysis.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Medical disputes are a kind of disharmonious interpersonal relationship caused by different positions, views, and ways of thinking between doctors and patients. In recent years, it has shown an upward trend and has become a major issue affecting social stability and hospital development. This paper refers to the research process and results of Henkel et al. (2020) in the service industry and uses the CREMA-D dataset to train the model to obtain the accuracy of the algorithm and design an experimental process for using AI to solve doctor-patient relationships. This paper uses the CREMA-D dataset to divide emotions into six types: anger, disgust, fear, happy, neutral, and sad. Using natural language processing and LSTM to process emotions in speech data provides help and guidance for doctors to regulate patients' emotions, and helps doctors use AI integrated tools to regulate their own emotions and improve the happiness index of both parties.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Semantic segmentation is a significant and demanding work in computer vision and it has gained more attention worldwide. This article delivers an in-depth analysis of vision-based semantic segmentation approaches for 3D point cloud data. This article investigates the emergence and development of semantic segmentation both domestically and internationally. It also outlines the historical evolution and various branches of semantic segmentation and emphasizing recent advancements driven by deep learning techniques. Despite notable progress, challenges persist, including handling variability in object shapes and sizes, computational costs, and robustness against different conditions. This survey aims to evaluate and synthesize current research, identifying strengths and weaknesses of traditional and modern methods, and highlighting potential future research directions. The study offers valuable information on the implementation and performance of different segmentation approaches by presenting an comprehensive analysis of methodologies, datasets, and evaluation metrics and guiding researchers towards suitable techniques for several applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper introduces a neural network designed to be lightweight (DL) for face expression recognition using dilation convolution without increasing parameters, enabling efficient global feature extraction. Incorporating Leaky Relu mitigates gradient vanishing and enhances training robustness. Experiments show a 71% and 52% reduction in parameters for DL_Vgg19 and DL_ResNet34, respectively, with minimal accuracy loss, validating the method's efficiency.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The innovative generation of vector graphics with fine-grained images using Artificial Intelligence has become an important task in edge extraction. In this paper, we take Qiang embroidery image as an example due to its containing fine-grained edges, which is more suitable for the study of image processing and pattern recognition. We firstly adopt appropriate pre-processing methods, improved adaptive median filtering (IAMF) for the image to reduce image noise. Then, the Xception based on convolutional neural networks is used for edge detection and extraction. Results show that Qiang embroidery images, after denoising and edge extraction, can be clearly identified the shape characteristics of the images. Based on this approach, it can be converted into vector graphics for digital preservation and further artistic reinterpretation. The use of the Xception effectively addresses Qiang embroidery extraction in two-dimensional vector images, offering a practical reference for preserving related intangible cultural heritage.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In nighttime pedestrian detection tasks, objects in dark areas can be more easily detected with infrared images. However, the detailed features and contours of object are usually blurry in infrared images. Additionally, due to interference from heat sources, objects in the background with similar infrared radiation as pedestrians may overlap with them, causing confusion of detection models. To address these aforementioned issues, we propose a Mixed Local Channel Attention based YOLOv8 model (YOLOv8-MLCA) in this paper to detect pedestrians in infrared images. Firstly, we propose a Mixed Local Channel Attention (MLCA) module in YOLO's feature extraction backbone network. MLCA combines local and global information from the channel and space dimensions to enhance pedestrian details and contour features. To further address the problem of blurred pedestrian boundaries in infrared images, we propose a Minimum Points Distance based MPDLoss for bounding box regression during model training. We conducted comparative experiments on the LLVIP dataset. Among 5 baseline models in the experiment, the proposed YOLOv8-MLCA achieved the highest accuracy. We also conducted comprehensive ablation analysis to explore the performance of MLCA and MPDLoss. The experimental results validated the effectiveness of the proposed improvements.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The production process of modern tobacco machinery is highly precise and complex, and when unpredictable failures occur in the production line, they can lead to economic losses for tobacco companies. Artificial Intelligence online monitoring technology can provide real-time fault alarms for tobacco production lines in production, and at the same time, without affecting the overall operation of the production line, it can improve the overall operational efficiency of the tobacco enterprises, which in turn can generate economic benefits. The fundamental process of visual inspection of cigarette package appearance quality is to classify the inspection image to distinguish qualified and unqualified packages, how to accurately judge the “qualified” and “unqualified” packages is the key to improve the accuracy and reduce the false reject rate. How to make accurate judgment on “qualified” and “unqualified” cigarette packs is the key to improve accuracy and reduce false reject rate. By analyzing the collected images of strip cigarettes, there are certain feature differences between qualified and unqualified images, in order to effectively determine the abnormal strip cigarettes, machine learning-based detection methods can be used to analyze the image feature information, extract the feature parameters, establish the cigarette packaging classification model, judge the defective strip cigarette products and reject them. The main content of the research in this chapter is to extract the image data features by combining wavelet transform and grayscale covariance matrix, and then use support vector machine and BP neural network algorithms respectively to sample and learn the training sample set of cigarette, and then classify and predict the test sample set to verify the classification performance and compare the classification effect of the two classifiers.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This article focuses on key endogenous security technologies and, based on the network endogenous security defense concept of "structure determines security" in mimetic defense, studies heterogeneous redundancy construction and heterogeneous technology and redundant execution technology for IoT access gateways. Firstly, the sources of heterogeneity were analyzed. Based on the timing of introducing heterogeneity, heterogeneous technologies were divided into three types: compile time, pre runtime, and runtime, and each of the three heterogeneous technologies was studied separately; We conducted in-depth research on process level heterogeneous redundant execution technology, namely multi variant execution. In order to solve several problems of process level redundant execution, we studied resource isolation technology on the Linux platform and combined it with container technology to form a container level redundant execution technology solution for power IoT access gateways.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Aiming at the complexity of open-circuit fault diagnosis of soft-switching inverters, a dual-branch fault diagnosis method based on one-dimensional convolutional neural network (1-D CNN) combined with improved deep residual shrinkage network (IDRSN) and two-dimensional convolutional neural network (2-D CNN) with attention mechanism (IDRSCNN-ACNN) is proposed. This method combines the advantages of 1-D CNN that can extract the primitive features from the original time series with the advantages of 2-D CNN that can extract high-dimensional features from images. It can mine more effective spatial features for inverter fault diagnosis. In addition, the method integrates an improved deep shrinkage residual network (IDRSN) and coordinate attention (CA) mechanism to improve performance. Firstly, the input current data on the power side of the soft-switching inverter is expanded by sliding window overlapping sampling, and the extended data is converted into time-dependent Markov images by using Markov transition field. Secondly, a parallel dual-branch IDRSCNN-ACNN fault diagnosis model is designed. Then, the original time series data and Markov image are used as the input of 1-D branch and 2-D branch respectively to train the model. Finally, the Softmax classifier is employed for precise fault classification. Experimental results show the method’s efficacy in classifying mixed-noise data across 79 fault types. IDRSCNN-ACNN has better fault diagnosis performance through ablation experiment and comparison with some traditional fault diagnosis models.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In response to the weakness of traditional 3D point cloud object detection algorithms in detecting small targets with low accuracy, an improved PointPillars method based on spatial attention mechanism is proposed. After the formation of the pseudo-image by the pillar feature network, a GE (Gather-Excite) module is added. This module first allows the feature map to obtain spatial information through Gather, and then matches this spatial information with the input through Excite to produce attention information, using feature context to enhance the expressive power of convolutional neural networks. Finally, the improved algorithm is verified using the public dataset KITTI. The experimental results show that this method can accurately detect small targets such as pedestrians and cyclists, and also has significant effects on car categories. Compared with the benchmark model, the 3D average precision (AP) of the three target categories under medium detection difficulty conditions has been increased by 0.71%, 1.23%, and 2.15%, respectively, proving the effectiveness of the proposed improvement.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
To address the issues of slow detection speed and low accuracy in detecting fall behavior among workshop personnel, a deep learning-based fall detection method is proposed. First, the system architecture for detecting fall behavior of workshop personnel is designed. Then, the fall detection algorithm based on YOLOv5 is introduced, including the construction of the network structure and the training of the detection model. Additionally, to facilitate viewing the detection results, a visualization interface for the results is designed. Finally, system testing is conducted through experiments. The results show that, compared to several other common object detection algorithms, the YOLOv5-based fall detection method offers advantages such as fast speed, high accuracy, low cost, and good stability. It can be applied to monitor fall behavior in workshops, helping to reduce safety accidents during production processes.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Adverse environmental conditions such as haze, rain, and dust can significantly impair visibility for vehicles in motion, leading to blurred vision. Traditional de-fogging techniques often encounter limitations, including reliance on pre-existing assumptions, color distortion, and inadequate real-time performance. Consequently, these methods may fail to effectively distinguish traffic light images post-de-fogging, thereby hindering target recognition in autonomous driving systems. This study introduces a convolutional neural network that integrates deep and shallow feature fusion utilizing the Efficient Channel Attention mechanism. By employing an atmospheric scattering model, shallow features from fogged images are extracted through convolutional layer operations, while deep features are obtained using parallel multi-scale convolution kernels. The integration of shallow and deep features is achieved through a series of fusion steps, enhanced by the application of an effective channel attention mechanism. Ultimately, a transmittance map corresponding to the fog map is derived through nonlinear regression, facilitating the recovery of fog-free images and improving the recognition capabilities of the YOLOv8 model. Experimental results demonstrate that this approach effectively identifies traffic light targets during simulations of vehicles navigating traffic intersections. Furthermore, the quality of fog removal achieved by this method surpasses that of other algorithms, as evidenced by both subjective evaluations and objective analyses.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the rapid development of deep learning technology, object detection, as an important task in the field of computer vision, has also been widely applied in the ROS(Robot Operating System) robot field. ROS, as an open-source robot operating platform, provides a favorable operating environment for object detection algorithms. This article studies the improvement strategy of real-time object detection algorithm (RT-DETR) for small devices, which provides a better solution for the field of robotics. Through the application analysis of the RT-DETR algorithm, we found that there are certain issues with detection accuracy and efficiency, as well as the ability to detect small objects, when the performance of small devices is limited. Especially when the camera is not in focus or the image is ghosted, the issues of detection accuracy and efficiency are particularly prominent. For this purpose, we propose a dynamic and irregular deformable convolution kernel strategy to address the performance issues of small edge devices in terms of detection accuracy. In response to efficiency concerns, we propose an enhanced non-linear network structure to achieve greater non-linear capability with fewer parameters applied during the operation process. Finally, we combine the two methods to form a DenNet network (Deformable and Enhanced Nonlinear Convolutional Kernel Networks). Through experimental verification, our improvement strategy can greatly improve the detection accuracy and efficiency of small devices, solve the efficiency problem of insufficient performance of small devices, and has important practical application value.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Existing lightweight road sign detection methods can hardly address the challenges of occlusion and scale difference. Therefore, a Faster-YOLOv8n based lightweight road sign detection method is proposed in this paper for intelligent unmanned vehicles. Firstly, we introduce Partial Convolution (PConv) into traditional YOLOv8n. The bottleneck of the C2F module is replaced by FasterBlock, which reduces the number of parameters in the model and improves the speed. To reduce the accuracy loss due to lightweighting and scale differences in the near and far views, a Multi-scale Feature Interaction Attention (MSFIA) is proposed in the feature fusion neck network. In addition, a Spatially Reinforcement Attention (SRA) module is proposed at the detection heads of the network. SRA strengthens the response to unoccluded regions and weakens the background regions to address the occlusion problem. We conducted comprehensive comparative experiments on the TT100K dataset. The experimental results show that compared to the original YOLOv8n, the proposed Faster-YOLOv8n has a substantial improvement in both detection accuracy and inference speed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Cigarette detection plays a crucial role in environmental protection and public health. However, current deep learning based algorithms for small object detection often suffer from poor accuracy and are highly susceptible to environmental interference and variations in lighting conditions. To address these challenges, this paper introduces an end-to-end cigarette detection algorithm: YOLO-CD. First, a novel adaptive multi-scale feature extraction module is designed to more comprehensively capture features across different scales. Second, to mitigate the distortion and loss of small object information in deep networks, a more flexible and efficient downsampling design is employed. Additionally, a deep supervision mechanism is introduced by adding auxiliary detection branches in the intermediate layers of the network, effectively enhancing the model's ability to capture small object features without increasing computational complexity. Experimental results demonstrate that YOLO-CD outperforms existing mainstream methods on public datasets, achieving a 3.0% improvement over the baseline model while also reducing the number of parameters and computational load.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The detection of underwater objects is of great importance in fields such as oceanography and ecological monitoring. However, traditional detection methods are hindered by significant challenges, including the prevalence of noise interference, low contrast, and complex illumination variations in underwater environments. To address these challenges, a novel frequency domain attention mechanism is proposed. This mechanism combines frequency domain processing with an attention mechanism, weighting the input feature map and subsequently fusing the processed feature outputs. In terms of frequency domain processing, the module employs a range of techniques, including frequency weighting, multi-scale enhancement, phase-preserving filtering and noise suppression. This is done with the aim of optimising the detection of complex underwater targets, which significantly improves the robustness and accuracy of target detection in complex underwater environments. Concurrently, the attention mechanism generates corresponding feature patterns and specific information by dynamically weighting global and local features, which enables the effective identification of the most important features in the current context, thus providing higher accuracy in target edge and texture recognition. The experimental results demonstrate that the target detection model integrated with the frequency-domain attention mechanism exhibits improvements in various metrics, ranging from 0.2% to 1.4%, across multiple datasets. Furthermore, it outperforms the traditional attention mechanism in the majority of cases. These findings not only validate the efficacy of the frequency domain attention mechanism in underwater target detection but also offer novel insights and avenues for future research in this field.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The maintenance and inspection of power lines is key to ensuring their normal operation and maintaining an uninterrupted power supply for various human activities. Traditional methods to detect power line assets usually detect only a small number of assets, and face the challenge of low inspection accuracy and high computational resources. This study proposes a power line asset detection algorithm designed based on the YOLOv8 to address these issues. Firstly, the C2f-DC module is added to the algorithm’s backbone network, enhancing the feature extraction capability of the network. Then, the MSCA module is incorporated into the algorithm’s neck network. This module effectively captures multi-scale contextual information, thereby enhancing the feature extraction capability of the algorithm. Finally, the DyHead module and the Inner-IoU loss function are used to replace the original detection head and the original loss function, respectively. The Dyhead module dynamically adjusts feature representations, thus boosting the detection accuracy of the algorithm. The Inner-IoU loss function addresses the issues of poor generalization and slow convergence associated, thereby improving the performance of the algorithm. Experimental results demonstrate that the power line asset detection algorithm designed based on YOLOv8 achieves a mean average precision (mAP) of 90%, accurately detecting power line asset.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Fluidized bed granulation is a unit operation widely used in the pharmaceutical, chemical and food processing industries. It is a manufacturing technology that by suspending lose powders using hot air and transforms the powders into granules of uniform sizes to improve compaction and flow characteristics. The granule size distribution and moisture content are important quality indicators that are currently characterized by sampling and offline analysis in the laboratory, leading to time delay in measurement. This work reports an investigation of machine vision combined with deep learning image segmentation for on-line real-time monitoring. A non-invasive microscopic imaging probe with an integrated light source is designed and mounted on the granulator’s sight glass to monitoring the granule dynamic changes in particle morphology and size. In addition, a near-infrared spectrometer combined with chemometric modeling is used for real-time monitoring of moisture content.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
To address the issues of low detection accuracy for small road damage targets, poor adaptability to target deformations, and high missed detection rates in complex backgrounds, this paper proposes an improved road surface damage detection model, YOLO-DCNet, based on YOLOv8. First, the C2f module is integrated with deformable convolution (DCNv3) to enhance the model's ability to detect irregularly shaped road damage. Second, the CBAM attention mechanism is incorporated, combining spatial and channel attention to optimize feature extraction. Finally, the Dynamic Head is introduced to improve multi-scale feature fusion and detection capabilities, effectively enhancing the model's performance in detecting road damage at various scales. Experimental results on a road damage dataset show that the YOLO-DCNet model achieves a 2.7% improvement in mean Average Precision (mAP), a 2.6% increase in Recall (R), and a 3.2% increase in Precision (P) compared to the original YOLOv8n, resulting in more accurate detection.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
U-shaped structures are widely used in the design of salient target detection networks. However, this structure commonly suffers from the problems of losing spatial location details and difficulties in capturing edge details, and is usually accompanied by an excessive amount of model parameters. To address these problems, this paper proposes a lightweight saliency target detection network with deep semantic information-guided feature fusion. First, the skip connections in the outer layer of the network are redesigned so that they can fuse different scales of feature information in this layer and all shallower layers above, thus enhancing the network's ability to capture edge details. Second, an MCA module is incorporated into the residual U-block to handle the last layer of features of the feature extraction network, to enhance its representational power and to serve as a semantic guide in the decoding process, facilitating the fusion of features between the decoding side and the encoding side. Finally, a depth-separable convolution is used to replace the traditional convolution in order to reduce the computational and parametric quantities of the network. The experimental results show that the proposed algorithm achieves excellent results in accuracy, precision, recall, mean-cross concurrency ratio, and F1 score, which proves that the algorithm has a better detection performance with more obvious boundaries.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the evolving landscape of global energy and information technology sectors in the early 21st century, the concept of smart grids has emerged as pivotal for enhancing the efficiency and reliability of power systems. This paper explores the integration of modern communication, computer, network, and control technologies within smart grids, highlighting their role in ensuring grid reliability, intelligence, and responsiveness. Despite varying definitions globally, smart grids universally aim to optimize grid operations through advanced technology integration. This article introduces a novel Transformer-LSTM encoder-decoder structure that integrates LSTM 's robust capability to capture long-term dependencies with Transformer's proficiency in capturing global dependencies. The proposed model is applied to forecast fluctuations in power grid data traffic, facilitating real-time adjustments to the grid's operational and maintenance strategies. Experimental results validate that the traffic prediction accuracy of the Transformer-LSTM model exceeds that of the independent Transformer and LSTM models.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Transformer has demonstrated excellent performance in image synthesis and object removal tasks, especially in capturing global representations. However, Transformer tend to overlook local image details, and their computational complexity increases quadratically with spatial resolution. To address these issues, we propose a hierarchical integrated Transformer for marine snow removal. Specifically, we leverage the Transformer to capture global information in the latent space and hierarchically integrate it into a CNN-based model to remove marine snow particles and restore image details. In the integration process, we introduce a global-local integration module that effectively combines global and local information across multiple levels from Transformer and CNN respectively. We conduct extensive experiments on public datasets of marine snow, including MSRB and Snowy-VAROS, demonstrating the exceptional performance of our method for marine snow removal.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The enclosed environment of tunnels poses significant challenges to the effective implementation of firefighting strategies in the event of a fire. Due to the complex internal structure and narrow space of tunnels, traditional emergency response methods are often inefficient under such conditions. Therefore, it is particularly crucial to quickly and accurately determine the location of a fire, develop the optimal firefighting plan, and continuously monitor the effectiveness of firefighting. Currently, many methods rely on simple deep learning models to locate fire sources, but in complex fire scenarios, these models often exhibit insufficient lightweight and flexibility. To address this issue, we propose a deep learning based fire localization model that can achieve automated fire detection in complex and ever-changing tunnel fire environments. Compared with traditional deep learning methods, this model not only improves accuracy, but also significantly accelerates response speed.To verify the effectiveness of the model, we compared seven existing fire detection models, including backpropagation neural network (BPNN), convolutional neural network (CNN), long short-term memory network (LSTM), CNN-LSTM, bidirectional long short-term memory network (BiLSTM), CNN BiLSTM, and CNN BiGRU. The final results indicate that our proposed method based on CNN BiGRU performs well in fire location detection tasks, with excellent detection performance, significant robustness, and lightweight advantages.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This study aims to address the problem of poor stem node detection and recognition accuracy in high-resolution global sugarcane images, which is crucial for accurately identifying sugarcane stem node locations during automated sugarcane cutting. Since the sugarcane stem node occupies a very small area of the entire image, only about 0.2%, the feature blurring problem caused by scaling the original image to fit the network input size requirements often results in the loss of important information. In order to overcome this problem, this study conducts an in-depth study on the problem of small target feature loss caused by high-resolution images and proposes an improved algorithm STMFF-YOLOv9 (small target multiple feature fusion YOLOv9) based on YOLOv9-c. The algorithm uses a multi-scale stitching strategy to enhance the feature acquisition of small objects in image data training, designs an FDS (fusion down sampling) module with an improved feature fusion network downsampling process to reduce the loss of small target features, and implements a SE channel attention mechanism to recalibrate the fused channel features, thereby strengthening the focus on useful features. Experimental results showed that STMFF-YOLOv9 significantly improves the detection accuracy of sugarcane stem nodes in global images. Compared to YOLOv9-c, it achieves an 8.3% increase in precision, a 13.1% increase in recall rate, a 10.3% increase in mAP0.5, and a 9% increase in mAP0.5:0.95. Compared to other models, STMFF-YOLOv9 also demonstrates superior detection performance, effectively proving its capability to detect global sugarcane stem nodes.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the field of saliency detection for panoramic images, traditional equirectangular and cube projection methods in panoramic image saliency detection often face issues like distortion and discontinuities, impacting detection accuracy. This study introduces an innovative image resampling technique and a GCN-ELM joint model. By evenly distributing spherical pixels onto a 2D plane, the method reduces pixel redundancy from equirectangular projection. Experimental results show that this approach significantly enhances saliency detection performance compared to existing methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the rapid development of sensor devices, video-based human action recognition has emerged as a prevalent research direction. Current approaches have achieved remarkable results by utilizing inter-frame differencing or attention mechanisms to extract short- and long-term motion information. However, as the complexity of actions escalates, some methods struggle to adequately capture rich motion for fine-grained action recognition. To address these issues, we propose a lightweight model named Multi-stage Motion Excitation Network (MMEN), which integrates multi-level motion modeling with attention mechanisms for efficient fine-grained action recognition. MMEN comprises three key components, including the Motion Boundary Perception (MBP) to perceive subtle intra-segment motion changes, the Two-way Motion Selection (TMS) to model inter-segment action evolution, and the Spatio-Temporal Global Attention (STGA) to capture video-level information. Experimental results on the HDMB51, Diving48, and Something-Something V1 datasets demonstrate that MMEN enables the model to learn richer motion information while achieving a better balance between computational cost and recognition accuracy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the on-orbit space scenario, the accurate measurement of the position and attitude information of the space container is related to whether the spacecraft can accurately capture it. In view of the problem that the existing attitude measurement methods in the current space scenario are not good enough in the measurement of the attitude of the space container, a space container attitude measurement system based on multi-line structured light is designed. The system projects multi-line structured light onto the surface of the space container, combines laser triangulation to obtain the point cloud information of the surface of the space container, and then obtains the attitude information of the space container through point cloud fitting. In addition, in view of the difficulty in identifying the multi-line structured light on the surface of the measured object during measurement, this paper introduces an additional line structured light on the basis of the original multi-line structured light system to assist in the identification of the multi-line structured light. Experimental verification shows that the maximum distance measurement error of the measurement system is less than 0.6mm, and the maximum attitude measurement error is less than 1°, which meets the requirements of the working scenario and proves the feasibility of the measurement system.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The conventional online monitoring method of optical cable mainly uses YOLOv4 (You Only Look Once version 4) vertical detection algorithm to calculate the inclination threshold of optical cable, which is vulnerable to the effect of Brillouin scattering frequency shift, resulting in abnormal monitoring results. Therefore, an online monitoring method of optical cable based on optical signal reconstruction algorithm and wavelength conversion is proposed. In other words, optical signal reconstruction algorithm and wavelength conversion are used to build the optical cable online monitoring model, and the optical cable online monitoring center is designed to complete the optical cable online monitoring. The experimental results show that the optical cable online monitoring method designed based on optical signal reconstruction algorithm and wavelength conversion has good monitoring results, excellent monitoring performance indicators, reliability, and certain application value. It has made certain contributions to improving the safety of optical cable operation and reducing the difficulty of optical cable operation and maintenance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The conventional method of manually marking the landing points of tennis balls is inefficient, labor-intensive and prone to subjective errors, which limits accuracy and efficiency. This study aims to develop a video-based tennis landing point detection system that allows players to analyze and refine their stroke preferences, while providing coaches and referees with more insightful data support. We have introduced a new tennis landing point dataset, T-Point, which includes different tennis usage scenarios. The PP-TSMv2 model, which uses a 2D network, was chosen as the recognition framework. To improve the performance of the model in recognizing tennis landing points, we replaced the SE module in its PP-LCNetV backbone network with the convolutional attention module CBAM. The original model achieved a top1_avg of 0.6889 on the T-Point dataset, while the refined model achieved a top1_avg of 0.7333. The experimental results confirm that the integration of spatial information significantly improves the performance of the model.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Object detection is a core problem in the field of computer vision and finds extensive applications in areas such as autonomous driving and security surveillance. Traditional YOLO-based object detection algorithms often struggle with complex backgrounds, varying object shapes, and detecting small objects. To address these issues, this paper proposes an improved YOLO model based on deformable convolutions. Deformable convolutional networks enhance the model’s ability to perceive complex object shapes by introducing spatial deformation capabilities. Experiments were conducted using the publicly available COCO128 dataset and the experimental results show that after introducing deformable convolutions in deeper layers of the model, overall detection accuracy improves, with the mAP50-95 reaching 62.5% when replacing the seventh layer, an increase of 1.2 percentage points compared to the original model. The results indicate that the YOLO model based on deformable convolutions offers certain advantages in handling complex object detection tasks.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Modern industrial production, the industrial sector and daily life are inextricably linked to a plethora of chemical machinery and equipment, which serve as indispensable production apparatus. It is therefore paramount to guarantee their optimal functionality. It is therefore of great significance to the field of chemical machinery and equipment that fault monitoring is given due attention. The progressive advancement of computer and wireless communication technology has facilitated the maturation of expert diagnosis systems and the Internet of Things. Consequently, fault monitoring systems based on these technologies are poised for widespread adoption. In this work, the instrument operation data collected by IoT devices is used to undergo preliminary cleaning and normalization to ensure the consistency and accuracy of the data. This preprocessed data is then fed into a multi-layered deep neural network (DNN) model. In the model training stage, the backpropagation algorithm is used to calculate the gradient of each layer, and the weights and biases in the network are adjusted by the stochastic gradient descent (SGD) optimization algorithm to minimize the loss function. Through iterative training, the model gradually improves its ability to identify failure modes and predict accuracy. This approach enables deep neural networks to automatically learn complex fault signatures from large amounts of data and effectively detect and predict instrument failures in real-world applications. An experimental platform was set up in the LabView programming environment, equipped with 10 sensors, which were used to collect experimental data from the sensors and conduct fault monitoring experiments. The model shows high accuracy and robustness in different types of instrument fault detection, proving its effectiveness in practical industrial applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We introduce a novel approach to edge detection named AttnEdge, which combines pixel differential convolution with advanced post-processing techniques to enhance the detection and representation of image edges. This approach utilizes Pixel Difference Convolution (PDC) to directly learn edge features at the pixel level, enhancing the ability to handle complex boundaries and multi-scale features within images. Significantly, AttnEdge integrates a self-attention mechanism to capture global dependencies and refine feature representations, greatly improving the accuracy of edge detection. Attention mechanism plays a key role in edge detection model, which can significantly improve edge recognition accuracy and model robustness by analyzing global dependencies among pixels. It helps the model effectively distinguish between noise and real edges in complex backgrounds and optimizes detection results. The post-processing stage, featuring Gaussian blur and adaptive thresholding, further refines edge detection by reducing noise and dynamically adjusting to local brightness variations, ensuring robust and reliable edge detection across various scenarios. Furthermore, we have augmented the traditional convolution operations with innovations such as channel reduction in the self-attention mechanism, which reduces computational complexity while maintaining high performance. Experiments on the BSDS500 dataset demonstrate the superiority of AttnEdge over existing methods like PiDiNet, particularly in terms of structural similarity and noise resilience, making it a promising solution for advanced edge detection tasks.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the field of object detection, deep learning has been used extensively, especially in algorithms like Yolov7, which have achieved significant accuracy improvements. However, traditional convolutional neural networks are computationally intensive and require powerful GPU support, making its deployment on embedded devices difficult. This presents a problem for researchers as the high device requirements hinder their related research work. Consequently, more people opt to use lightweight networks, such as Yolov7-tiny. However, when using Yolov7-tiny for underwater garbage detection, it has been observed that while achieving good accuracy in mAP (mean Average Precision) at IOU (Intersection over Union) of 0.5, the performance is not satisfactory in the mAP range of 0.5 to 0.95. This limitation may be attributed to the trade-off in network performance during the process of model lightweighting. To address these issues, An enhanced Yolov7-tiny method for the detection of underwater trash objects is proposed in this paper. First, the algorithm employs an enhanced Ghost convolutional feature extraction module, which starts with conventional convolutions using a smaller number of channels, then performs grouped convolutions to obtain partial output maps with features. Finally, the maps with features obtained from the first convolutional step are added to the channels obtained from the second grouped convolution step. This design effectively reduces model complexity while extracting richer feature information. Secondly, the algorithm utilizes the CA (Channel Attention) mechanism to weight channels based on their positional information, thereby efficiently extracting features. The network can concentrate more on important feature regions by learning position weights in the feature maps. Lastly, the algorithm combines the Repeated Weighted Bi-directional Feature Pyramid Network (BIFPN) for feature fusion. BIFPN employs multiple down-sampling steps for short skip connections, enhancing the long-range correlations of encoded features. The results indicate that the object detection algorithm performs better and multi-level feature information may be integrated more effectively. Through experiments on public datasets, the proposed algorithm demonstrates superior accuracy, speed, and computational effectiveness in contrast to the Yolov7-tiny network's initial design. The improvements and combination of these modules effectively enhance feature representation, extract essential information, and boost detection accuracy, thereby elevating the overall performance of the object detection algorithm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Tobacco is one of the important industries in China, and the annual tobacco tax is an important source of China 's fiscal revenue. However, in the process of cigarette production, it is inevitable that cigarettes will be short, which will affect the quality of cigarettes and the brand reputation of cigarette factories. Therefore, it is necessary to study the detection method of cigarette short phenomenon. At present, there is a lack of image data of cigarette shorts. The use of target defect detection methods for training has the problem of insufficient sample data and unsatisfactory training results, which may affect the detection accuracy of short cigarettes. Therefore, a YOLOv4 cigarette short detection method based on GAN algorithm is proposed. First of all, in view of the lack of sample data of cigarettes, the GAN algorithm is used to generate sample data, which plays a role in expanding the data set. Then, the real cigarette blank image and the blank image generated by GAN algorithm are merged as sample data, and the YOLOv4 network is trained to improve the detection accuracy of the defect detection model. Finally, experimental verification. Through experiments, it is found that the detection accuracy of the detection model can be improved to a certain extent by using the GAN algorithm to expand the sample data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the context of power grid operation, the detection of equipment status is of crucial importance for ensuring the stability and safety of the power system. However, the detection of power grid equipment confronts several challenges, such as the similarity in the morphology of electrical equipment, significant variations in the scale of target objects, and the balance between detection speed and accuracy. These issues have always been key research topics in this field. Based on the pruning strategy and dynamic attention mechanism, this paper proposes an efficient and lightweight object detector, named YOLO-AKD. The outcome is a new strategy that can significantly enhance the inference speed of real-time object detectors while maintaining accuracy. To verify the effectiveness of our strategy, we have created a self-made dataset of power grid switch cabinets, which contains the status of 12 types of electrical equipment. At the same time, a network architecture called YOLO-AKD has been established. We trained our YOLO-AKD from scratch on the self-made dataset of power grid switch cabinets without relying on the pre-trained weights of any other large-scale datasets. Through experimental comparisons with the current state-of-the-art real-time object detectors, including YOLOv6 and RTMDet. Taking RTMDet as an example, in the switch cabinet dataset, our method YOLO-AKD improves the mAP@0.5 by 0.2%, and at the same time, the computational cost of FLOPs decreases from 14.8 FLOPs to 7.9 FLOPs, fulfilling the objective of auxiliary teaching in power grid operation training.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the progress of science and technology and the improvement of people's living standards, the interior decoration design industry has ushered in a critical period of digital transformation and innovation. Especially in the display of interior decoration design, this change is particularly prominent. The traditional way of interior decoration design display is mostly plane renderings, which can simply and conveniently show the expected design effect, but it is difficult to meet the actual needs of repeated modification and complete experience. In this regard, this study will focus on the application effect of virtual reality technology in interior decoration design, and put forward a development scheme of virtual display system for interior decoration design based on Unity3D technology, in order to improve design quality and shorten design cycle. Practice has proved that the virtual display system can transform the traditional two-dimensional plane renderings into three-dimensional models and form virtual simulation scenes, which not only can give consumers an immersive interactive experience, but also provide new design tools and ideas for personalized customized design, and realize the data-driven design decision.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Low-light scenes are common in fields such as autonomous driving, which tests the robustness of intelligent systems. This paper tests the performance of object detection in low-light scenes based on the YOLOv8n model. In order to solve the constraint of limited computing power of embedded devices, the YOLOv8n model is adjusted through model pruning and knowledge distillation. The complexity of the model is reduced through these lightweight operations. While ensuring the detection performance as much as possible, the multiply-accumulate operations (MACs) and parameters of the model decreased by about 40% and 27% respectively. The proposed model is evaluated on the Exclusively Dark (ExDARK) dataset and achieves a mean average precision (mAP) value of 0.672. In addition, the proposed model is migrated to the embedded platform NVIDIA Jetson Xavier NX. The deployment process is optimized by multi-process scheduling and multi-thread scheduling. Experiments are also conducted on real data, and the results show that the proposed model can provide a feasible solution for vision-based night time autonomous driving.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Fire Emergency Power Supply (FEPS) can replace the mains power supply to supply power to the load and maintain the normal operation of firefighting equipment in the event of a fire and the mains power supply stops. At present, the main inverter control method used in FEPS on the market is sinusoidal pulse width modulation (SPWM) control, which is relatively mature. However, in order to improve the output voltage quality, the carrier frequency is often set relatively high when using SPWM control method, which leads to high switching losses in FEPS. This article uses Selective Harmonic Eliminated Pulse Width Modulation (SHEPWM) instead of SPWM to control FEPS. Based on Fourier analysis, a unipolar 1/2 cycle SHEPWM FEPS inverter model is established, and a unipolar 1/2 cycle SHEPWM FEPS control system is designed to control the phase, amplitude, and model freedom of the inverter output voltage. A comprehensive simulation model of SHEPWM FEPS was built in MATLAB SIMULINK. By changing the DC power supply and load, the simulation results were analyzed to verify the feasibility of applying SHEPWM technology in FEPS.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Rolling bearings are critical components in rotating machinery, directly affecting the efficiency and reliability of the equipment. However, they are prone to various faults under complex conditions and strong noise backgrounds. Traditional fault diagnosis methods often struggle to accurately extract fault features from these complex signals, resulting in low diagnostic accuracy. This paper proposes a comprehensive algorithm combining Variational ModeDecomposition (VMD), Convolutional Neural Network (CNN), and Bidirectional Long Short-Term Memory (BiLSTM)network, referred to as VMD-CNN-BiLSTM, for rolling bearing fault diagnosis. VMD decomposes complex signals into multiple Intrinsic Mode Functions (IMFs), reducing signal modal aliasing and endpoint effects. The CNN layer extracts local features, and the BiLSTM layer captures temporal dependencies and bidirectional relationships within the signals, achieving accurate fault feature recognition. The innovation lies in integrating VMD with CNN and BiLSTM, optimizing signal decomposition and feature extraction processes to enhance fault diagnosis accuracy and robustness. The algorithm's effectiveness is validated on the Case Western Reserve University bearing dataset with a rotational speedof1797 r/min, including ten fault conditions. The Arctic Puffin Optimization algorithm optimized VMD parameters, and optimal IMF components were identified using minimum envelope entropy. Feature extraction resulted in 1200 samples, divided into training and test sets. Experimental results demonstrate that the VMD-CNN-BiLSTM model achieves a diagnostic accuracy of 98.6667% with a training time of 9.5667 seconds. This method accurately extracts fault features in complex and noisy environments, significantly improving fault diagnosis performance and reliability.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
As a non-destructive detection method, X-rays are widely used in the field of electronic component inspection. However, the subsequent defect detection needs to be completed manually, which leads to poor efficiency and low reliability due to a large number of components. To solve the above problems, we propose a component X-ray image defect detection method based on deep learning. On the one hand, we have designed an algorithm for the segmentation and correction of X-ray images. On the other hand, in the case of fewer defect samples and variable defect forms, we only use defect-free samples for training. We propose an unsupervised learning model based on variational autoencoder and add a convolutional attention module to realize automatic reconstruction of defects. In addition, we combine gradient magnitude similarity and absolute error of the input image and the reconstructed image to detect and locate the defect region. The effectiveness of the proposed method is verified through experiments on a typical X-ray dataset, and the accuracy of defects detection reaches about 99%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Pavement cracks are a critical aspect of road maintenance, and their timely and accurate detection is essential for road upkeep. This study aims to evaluate and optimize the application of image segmentation models in pavement crack detection to enhance the efficiency and accuracy of road maintenance. The research methods include experimental comparisons of five mainstream image segmentation models (U-Net++, MANet, FPN, LinkNet, and PAN), and further improvement of model performance through hyperparameter optimization. The main results indicate that the U-Net++ model performs best in terms of the Dice coefficient, with an average Dice score of 0.715 when the batch size is 32. After optimization of the encoder, optimizer, and learning rate, the final Dice coefficient increased to 0.734. The analysis shows that the superior performance of the U-Net++ model is attributed to its improved skip connections and cascade modules, as well as its effective feature fusion capability.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Since the thermal image of an object is different from the visible image of the object, it is completely different from the visible image directly seen by the human eye, which directly reflects the temperature distribution of the surface of the object. Infrared thermography enables the visualisation of thermal images of the temperature distribution on the surface of an object and directly displays the temperature values transformed by electrical signals. Since there is no correlation between brightness levels in different spectra, many methods for aligning interactive information are not applicable. Thus, this paper proposes a study on the development of a hardware module for the dual-spectrum fusion of high-definition visible and high-resolution infrared thermal imaging. Image fusion can be carried out at different levels, and super resolution infrared thermal images can be constructed by using alignment algorithms for infrared and visible images, and by fusing the visible and thermal imaging visions. Finally, the construction of the system and hardware selection are carried out, and the results of the image alignment experiments are analysed in detail. The test results show that the maximum ranging error of the binocular ranging system is 0.06m in the range of 0-2m, while the temperature measurement error of the infrared temperature measurement system is less than 0.1℃ at a close range of 0.2m, and the multi-target temperature measurement error is less than 0.2℃ at a distance of 2m, which indicates that the proposed multi-target long-distance temperature compensation scheme is able to improve the accuracy of infrared temperature measurement effectively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Power components, such as electric poles and insulators, play a critical role in ensuring the stability and functionality of UAV (unmanned aerial vehicle) electrical systems. The detection of these components allows for the analysis of potential defects, malfunctions, and weaknesses that may compromise the efficiency and safety of the overall power infrastructure. Annotating real data obtained from power component detection poses inherent difficulties that require meticulous attention to detail. Manual annotation of real data often proves to be time-consuming, error-prone, and subject to various limitations such as human bias and inconsistencies. Moreover, the sheer volume of data to be annotated can further exacerbate these challenges, hindering efficient and accurate analysis. To mitigate the challenges associated with real data annotation, we have devised a mechanism that incorporates virtual data to complement and supplement the real data-driven detection process. By creating a synergy between real and virtual data, the system can simulate various scenarios, augmenting the real data pool. This integration reduces the burden of manual annotation, enhances the accuracy of analysis, and ultimately improves the overall performance of power component detection. The integration of virtual data as an auxiliary tool in power component detection has proven to be a significant breakthrough. By reducing the workload associated with manually annotating real data and enhancing the accuracy of analysis, this approach has the potential to revolutionize the field of power infrastructure detection. Experimental validation shows that when sampling 10% of the virtual data, it can boost the MAP of the real data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Aiming at the problems of missing smooth data, limited detection range and high false alarm rate in diesel generator set load state anomaly detection, an improved method based on PANet network and IMF is proposed. Through multi-sensor data acquisition and pre-processing, the anomaly detection model is constructed by combining PANet network and IMF to achieve comprehensive coverage and high accuracy of load state detection. Experiments prove that the method has excellent performance and is close to the ideal detection effect.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposes a specialized network model, Lightweight Dynamic Inverted Residual Network (LDIRNet), for the detection of surface defects in ceramic bearings. The model integrates the iRMB module to improve feature extraction accuracy and reduce computational costs. In addition, the DySample sampling algorithm is used in the study, which significantly improves processing speed and reduces memory requirements while maintaining high efficiency. By optimizing the lighting conditions and using composite light source illumination technology, the probability of missed detections and false alarms can be reduced. Combined with the LDIRNet detection network, the detection performance of surface defects in ceramic bearings is effectively enhanced. Experimental results show that LDIRNet effectively improves the detection accuracy of surface defects in ceramic bearings, with an mAP0.5 value of 75.2% on the CBS-DET dataset, an increase of 2.4%. The model parameters are reduced, and detection speed is improved. Compared to existing detection models, this research method demonstrates significant improvements in detection accuracy, model size, and computational efficiency, especially in terms of model parameters and detection speed, proving its potential application value in practical industrial scenarios.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
To solve the concern regarding low precision in detecting apple maturity, a refined apple target detection method built upon YOLOv8 is put forward. Firstly, CBAM attention mechanism is integrated into the architecture of the neural network. architecture to make the model focus on the specific region of interest in the image, reduce the background interference, and improve the feature representation ability. Then the SPPF structure is improved. Additionally, a pooling layer has been incorporated to expand the SPPF architecture to include a maximum pooling operation at the fourth layer. Simultaneously, through pooling activities conducted at various spatial levels, the SPPF can retain a certain level of variability, enhancing the model’s capacity to detect changes in target shape and orientation. The pooling operations applied to subregions transform eigenvectors of variable lengths into fixed-length feature representations and makes an improvement on the computational efficiency related to the model. The results from the experimental analysis on the data set show that, compared with the original model, in this method, the recognition time of a single image is only increased by 3ms, the recognition frame number is reduced by 5.59 fps, the mAP is increased by 1.97%, and the target recognition rate of low maturity apples and medium maturity apples is increased by 3.44% and 3.17% respectively. The target recall rates of low maturity apples, medium maturity apples and mature apples were increased by 7.59%, 7.81% and 3.31% respectively compared with the original model.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Various problems afflict the complex network structures that are implemented in pedestrian detection and tracking applications, including a large number of parameters, lengthy training times, and slow running speeds. In this study, we proposed an improved pedestrian detection and tracking model based on YOLOv5s detection and DeepSORT tracking and used the video Mosaic algorithm based on the scale-invariant feature transform (SIFT) algorithm to expand the scope of pedestrian detection and tracking by addressing the aforementioned problems. We implemented MnasNet_P, cavity convolution, and SPP layers in our model. The Mish activation function replaced the LeakyReLU activation function to improve the generalizability of the model. A depth-separable convolution was introduced to replace the standard convolution of residual edges in the C3_1 structure, reducing the number of network parameters. Shufflenetv2_x1.5 was introduced as a pedestrian appearance feature extraction network, further reducing the number of network parameters while maintaining high tracking accuracy. Application of our model to public datasets demonstrated an improvement of 2.71% in average accuracy and an improvement of 30.7% in the FPS rate. Experimental results obtained with the MOT16 dataset demonstrated a substantially reduced model size while maintaining a high tracking accuracy, indicating that our algorithm is suitable for pedestrian tracking on mobile terminals or embedded devices. The real-time video stitching method based on SIFT expanded the tracking range of pedestrians.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Detecting remote sensing objects on few-shot datasets remains a challenging in the domain of object detection and recognition. As a feasible solution to this challenge, RST R-CNN, a novel approach for detection of remote sensing objects based on meta-learning strategy and has high detection accuracy on few-shot remote sensing objects is put forth in the present work. Relying on the meta-learning strategy, the proposed method turns out to have a high accuracy on few-shot remote sensing objects. In the proposed method, the feature enhancement module will precisely capture the shape features of the objects, and the respective field enhance module enlarges the respective field in feature extraction. Meanwhile, a feature interaction module is introduced to compute similarities between the support and query sets, thereby enhancing the quality of candidate box generation in few-shot object detection tasks. Besides, a triplet matching detector is designed, which can improve the model’s performance on new classes of data by making the data in the same class move closer and those in different classes to move farther from each other. Comparative experiments have been performed on two datasets—NPWU VHR-10 and the DIOR between our method and some state-of-the-art (SOTA) models to verify the effectiveness and superiority of our model Our model is found to have outstanding performance on both datasets and have outperformed the other methods. Ablation experiments are conducted, which prove the self-developed modules have contributed to improved performance of our proposed model. The research here is expected to provide new solutions to object detection tasks in remote sensing scenarios.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A rapid quantitative loading process using a rotating three bucket is proposed and software simulation of the loading process is conducted. Through software simulation, the feeding and unloading of three buckets and buffer buckets during the loading process, as well as the feeding situation of the carriage are studied. Simulations were conducted on the loading process with different vehicle models and loading parameters, and there were no incidents of material scattering or insufficient loading during the entire train loading process. This verifies that the rotating three bucket loading process can adapt to different loading conditions, theoretically verifying the feasibility of the three bucket time-sharing feeding and continuous loading process.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The smoking behavior of bus drivers poses significant risks to the operational safety and service quality of public transport systems. To achieve high precision and lightweight detection of bus driver smoking, this paper presents a novel method that integrates YOLOv7-tiny with a channel pruning algorithm. Firstly, a small target detection layer is introduced atop the original three detection layers to enhance the model's capability in detecting cigarettes in small targets. Secondly, the standard 3×3 convolution is replaced by the Depth Separable Convolution (DSConv), thereby substituting complex computations with simpler linear computations, consequently reducing the computational load of the model. Thirdly, a combination of the Complete Intersection over Union (CIoU) and the Normalized Wasserstein Distance (NWD) loss functions is employed. This not only optimizes the convergence speed of network training but also enhances the accuracy of cigarette detection in bus drivers. Additionally, the channel pruning algorithm is applied to further lighten the model while maintaining or improving detection accuracy. Finally, ablation experiments are conducted on a self-constructed dataset of bus driver smoking to validate the effectiveness of the proposed method. Experimental results demonstrate that the improved model achieves a 3.3% enhancement in detection accuracy compared to the original YOLOv7-tiny algorithm, with a reduction of 73.3% in parameters and 66.9% in computation. These outcomes underscore the feasibility of deploying the model on mobile devices.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Against the backdrop of accelerating digital transformation, online collaboration platforms have become indispensable tools for remote work and learning. However, with the increase in the number of users, the problem of high concurrency event processing has become increasingly prominent. Therefore, this paper proposes an event queue management algorithm (EQA-DT) based on debouncing and throttling techniques to optimize the efficiency of event processing in high concurrency scenarios. The EQA-DT algorithm uses a debounce technique to control the frequency of event capture, ensuring that only the last event triggered by the user is recorded within a short period of time. It also uses a throttling technique to periodically push events in the queue to the server, limiting the frequency of events being pushed to the server. We compared the EQA-DT algorithm with traditional event processing methods in an experimental setting. The results showed that in high concurrency scenarios, the EQA-DT algorithm is superior to traditional event processing methods in terms of reducing server pressure, reducing the length of the event queue, and improving system response speed. This algorithm provides an effective solution for improving the performance of online collaboration platforms and lays the foundation for future research in this area.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The development of automatic driving, unmanned aerial vehicle detection and virtual reality cannot be separated from the research of 3D point cloud. Existing studies either lack of local shape capture and local spatial feature description of point cloud, resulting in poor accuracy of point cloud classification network, or pursue fine and complex local feature extractors, sacrificing extraction time and memory overhead to extract classification progress, but the results show that the effect is not significantly improved. Therefore, this paper proposes a Point cloud classification method based on Local Point Dynamic Edge Graph Convolution and Global Information Fusion. Among them, the local point dynamic edge graph convolution module improves the edge graph convolution, which can improve the local extraction feature effect without deepening the local feature extractor, and reduce the time and memory overhead. The global information fusion block concatenates multi-layer local features to extract and fuse feature information to improve the classification accuracy of the network. Our network is tested on the public datasets ModelNet40 dataset and ScanObjectNN dataset. The results show that compared with the current mainstream point cloud classification algorithms, the point cloud classification accuracy of the proposed classification method is improved, and it has good robustness.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Multi Label Text Classification (MLTC) is an important research problem widely used in the field of natural language processing (NLP) text classification. Aiming at the problem of small sample data sets and unbalanced labels, this paper proposes a multi label text classification method based on prompt learning and RoBERTA Te People's Republic of ChinaNN model. First, use the RoBERta model to obtain the label word vector in the text, and obtain the global features containing the text context information. The dynamic mask can improve the efficiency of the work. Then use the TePeople's Republic of China NN model to extract the local features of the text, and use the feature fusion method to fuse the global features and local features, so as to improve the classification accuracy of the model. Among them, prompt learning is used to process small sample data, and loss function is used to solve the problem of label imbalance and longtail distribution. The experiment on the actual dataset shows that the accuracy rate of multi label text classification reaches 96%, which reflects the advantage of this model in effectively improving the effect of multi label text classification.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the rapid development of 5G and IoT, traditional cloud computing faces limitations due to increasing demands. Edge computing addresses these issues by offloading tasks to nearby nodes, thereby improving efficiency and user experience for latency-sensitive applications such as live video and autonomous driving. However, balancing energy consumption and latency in heterogeneous edge devices remains a challenge, particularly as previous studies often overlook user mobility. In this work, we tackle the problem of data stream processing (DSP) task placement within dynamic user mobility scenarios, addressing the challenges posed by frequent movement. Additionally, we introduce a reinforcement learning approach to enhance the adaptability of DSP tasks, allowing the system to effectively adjust to changing user mobility patterns. Through simulation experiments, it is verified that compared with the traditional method, the algorithm proposed in this paper effectively optimizes energy consumption and delay while considering the mobility attributes, and improves the utilization rate of each edge node.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper studies the application of the Bayesian optimization algorithm in DHCPv6 stateful allocation, addressing the efficiency issues of traditional allocation strategies in high-load scenarios. By constructing a simulation platform, simulating different network load environments, and validating optimization effects in a real network, the experimental results show that the Bayesian optimization algorithm significantly improves network response speed and address allocation success rates, particularly excelling under high-load conditions. This research provides an important reference for further optimization of DHCPv6 networks and verifies the practical value of the algorithm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
At present, the YOLO algorithm has become an indispensable core real-time object detection technology in aspects such as unmanned driving, face detection, and robot applications, and its versions are constantly being updated and upgraded. Herein, we deeply analyze the evolution process of the YOLO algorithm and carefully investigate the innovations and contributions arising from the algorithm iterations from YOLOv1 to YOLOv5. We make vivid and inspiring prospects for the future development direction and point out the feasibility and necessity of the research on the YOLO algorithm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In recent years, the CLIP model has achieved remarkable success in image-text retrieval tasks through contrastive learning. However, CLIP still exhibits certain limitations when handling complex backgrounds and small objects. To address these challenges, this paper proposes two key innovations: First, during inference, the YOLOv10 model is employed to detect and crop small objects and essential background information in the image, enhancing ability of CLIP to comprehend complex scenes. Second, the Next-ViT network is utilized as the backbone for image encoding. By leveraging its more efficient multi-scale feature extraction capabilities, Next-ViT improves the retrieval accuracy of small objects while also being more adaptable for deployment in industrial scenarios. Experimental results demonstrate that these two innovations significantly enhance performance of CLIP in image-text retrieval tasks and achieve a balance between accuracy and efficiency across various vision tasks.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Image segmentation is a crucial task in computer vision that involves dividing an image into distinct regions, with each region containing pixels that share similar attributes. Traditional methods like Otsu, Sobel, and Canny often suffer from high computational complexity and sensitivity to parameters. To address these limitations, this paper proposes a lightweight image segmentation algorithm that integrates MobileNetv3-Small with a global context block attention mechanism. MobileNetv3 leverages depthwise separable convolutions and optimized architecture to significantly reduce computational load and parameter count. To further improve segmentation accuracy, a deformable convolutional network is employed during feature extraction, while the global context block attention mechanism enhances focus on target regions. The proposed algorithm not only improves segmentation performance but also offers a lightweight solution suitable for resource-constrained environments.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The classification of microorganisms is an important branch of microbiology and plays a crucial role in understanding microbial diversity. Microorganisms include bacteria, viruses, fungi, protozoa, algae, etc, and they exhibit significant differences in morphology, physiology, and ecological niches. By classifying microorganisms with similar characteristics into groups, it aids in studying both their common properties and differences. In this study, PCA, GBDT, and XGBoost were used to generate new features, and machine learning methods (logistic regression, decision trees, and random forests) were compared with neural networks to determine the best predictive model and feature engineering approach. According to the experimental results, in machine learning models, the random forest model with three new features generated by PCA performed the best. In neural network models, the BP neural network with two new features generated by PCA performed the best.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The diversity human motion prediction task predicts multiple future motion sequences from historical data. Existing research uses likelihood-based sampling, but human motion's inhomogeneity often causes modal collapse and complex training. Our paper proposes a method using a linear dynamical system to model spatiotemporal dependence, obtaining orthogonal basis vectors via Tucker decomposition. By connecting these with encoded motion residuals and sampling the Grassmann manifold with a relaxed Bernoulli distribution, we predict future motions. Compared to existing methods on Human3.6M and HumanEva-I datasets, our approach mitigates pattern collapse, improves diversity by 3%, and reduces average error by 0.07.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the development of deep learning and computer vision technologies, human behavior prediction has gradually become a research area of great interest. Existing work has introduced the Transformer architecture into human behavior prediction, but due to the differences in data types and dimensions, one-dimensional position coding is difficult to effectively capture the spatial correlation between human joints. At the same time, there are problems such as poor differentiation of different channels and cumbersome training process. In order to solve the above problems, this paper is based on the denoising diffusion model and combines with the Transformer architecture to predict human behavior. The method gradually adds noise to the skeleton sequence in the forward process, and encodes the position of the joints by projecting the 3D human skeleton coordinates to the 2D position plane in the reverse denoising process, so as to effectively capture the relative positions and distances of the joints in the space. Then the compression and excitation module learns the interdependence between channels to dynamically assign channel weights to enhance the model's ability to distinguish between different channels. The experimental results show that the proposed method has advantages in prediction accuracy and stability compared with the existing methods, and the experiments on the Human3.6M dataset and HumanEva-I dataset verify the effectiveness of the method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In response to the computational complexity issue of the directed acyclic graph constraint in the Structure-Agnostic Model (SAM), we have proposed an innovative solution: an improved algorithm for the directed acyclic graph constraint based on continuous optimization. This algorithm, while maintaining the original acyclic constraint conditions, ingeniously introduces a hyperparameter α as an important component of the constraint. This improvement not only effectively reduces the computational complexity but also enhances the precision of the calculations to a certain extent.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
To achieve timely identification and prevention of potential construction risks, this paper presents an intelligent classification approach based on deep learning. Given that such texts abound with technical terms and have a complex engineering background, traditional classification techniques struggle to effectively capture their profound features, thereby leading to limited classification accuracy. First, the ALBERT model is employed to extract the dynamic semantic representation of the text, and subsequently, the complex semantic associations between the text are deeply captured through the multi-layer Long Short-Term Memory network (LSTM) coding layer. In the decoding stage, multi-layer LSTM integrated with the attention mechanism is introduced to enhance the inter-label dependency and optimize the multi-label sequence prediction. The experiment was carried out based on a disaster text dataset of a hydropower station, and the results indicated that the F1 score was as high as 90.64%, significantly enhancing the classification efficiency and safety management efficiency, and offering robust support for the formulation of precise preventive measures.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
To tackle the widespread issue of high dropout rates on MOOC platforms, this study presents a dropout prediction approach leveraging Graph Convolutional Networks (GCN). By analyzing learners' behavioral patterns at multiple temporal scales, the proposed method aims to identify potential dropout tendencies in a timely manner, enabling the adoption of targeted preventive or intervention strategies. Specifically, the proposed method automatically extracts continuous features from learners' activity logs over a specified period, utilizing these as independent variables for constructing a MOOC dropout prediction model. Initially, the behavioral data collected by the learning platform is transformed into time series data. A CNN-BiGRU network is then employed to extract features, generating feature vectors that effectively capture the temporal dynamics within the learners' activity patterns. These multi-dimensional feature vectors are subsequently used as node features in a Graph Convolutional Network (GCN), where two GCN layers establish intrinsic relationships between the features while preserving essential information through vectorization. This model allows MOOC instructors and course designers to monitor learners' progress in real time across different time intervals, enabling dynamic tracking of dropout behavior at various stages of the learning process. Experimental results on a large-scale MOOC public dataset demonstrate that the proposed method achieves superior dropout prediction accuracy compared to other state-of-the-art approaches.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The rapid growth of global trade has made ports indispensable core nodes in the global logistics network. The study of port throughput has become a hot topic for modern researchers. This study focuses on Guangzhou Port Group, extracting its throughput data from 2014 to 2023 for preliminary time series analysis. By observing the autocorrelation and partial autocorrelation graphs based on monthly throughput data, we explored how port throughput is influenced by changes in industrial structure, the economic conditions of the hinterland, transportation capacity, and unexpected factors. Furthermore, through regression analysis, we identified key factors significantly impacting port throughput, including total import and export trade volume (in billions of RMB), total retail sales of consumer goods (in ten thousand RMB), waterway freight volume (in million tons), fixed asset investment growth rate (%), and overseas pandemic death tolls. Additionally, this study developed a combined RF-SVM-XGB port throughput prediction model, improving the average error by 3 percentage points. In terms of predictive performance and model error, the combined prediction model demonstrated better fitting accuracy. Finally, to enhance the model’s stability and generalization ability, cross-validation was used for result validation. Feasibility suggestions for port development were also provided.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Water quality prediction is crucial for environmental protection and water resource management, particularly in accurately predicting pH values of water bodies, which can help identify potential water quality issues and take appropriate measures. This study utilizes historical water quality data provided by the United States Geological Survey to predict the pH values of multiple water stations in Georgia for the following day. Various feature engineering methods, such as Principal Component Analysis and Lasso Regression, along with machine learning models, were applied. We combined the strengths of different models using the Stacking ensemble learning technique, which significantly improved predictive performance. The experimental results show that the combination of feature engineering— specifically Polynomial Feature Generation—and the Stacking ensemble model effectively enhanced prediction accuracy. This research provides a new approach to improving water quality monitoring methods and contributes to more precise water resource management and environmental protection.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Smart contracts, as one of the important applications of blockchain technology, provide the possibility for automated execution of payment systems. This article explores an automated execution algorithm for smart contracts in payment systems based on blockchain and big data technology to address the issues of low accuracy and efficiency in contract execution. This algorithm utilizes the immutability and decentralization features of blockchain to ensure the security and transparency of payment transactions. At the same time, combined with the powerful analytical capabilities of big data, deep mining of transaction data is carried out to achieve accurate prediction and risk management of transactions. The automated execution function of smart contracts further simplifies the transaction process, reduces human intervention, and improves the efficiency and reliability of payment systems. The experimental results show that through the proposed algorithm, the payment system can provide more secure, efficient, and personalized services, promoting the digital transformation and development of the financial transaction field.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Aiming at the problem that the variable-step-length class matching tracking algorithm affects the reconstruction accuracy due to the insufficient selection of atoms during the iteration process of, this paper proposes an algorithm from the pre-selection of atoms with Dice coefficient matching and combines the secondary screening and variable-step-length principle. Firstly, the algorithm of adaptive selection of atoms by Dice coefficient matching is used to improve the correlation between the current residuals and the selected atoms by adding the angular characteristics of the signals while retaining the length characteristics of the signals. At the same time, for the presence of certain atoms in the atoms entering the iteration due to the primary selection in the reconstruction process that can still provide a contribution to the original signal, a secondary selection of atoms is carried out using an adaptive weighting strategy, and then the atoms that have already been selected into the support set are screened out using the projection backtracking principle. The balance between reconstruction accuracy and reconstruction time is achieved by adaptively selecting the step size at different stages of the algorithm operation through the noise value and the signal value estimated at each iteration. The reconstruction experiments on one-dimensional signals and two-dimensional images show that the algorithm can effectively improve the reconstruction effect on signals and images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Frequent convective weather and typhoons in China's coastal areas often lead to faults and trip events in transmission lines due to strong winds. However, there is a lack of precise measurement methods to reflect the electrical and geometric distance of transmission lines under wind conditions. This paper proposes a simulation wind protection technology method based on data analysis and artificial intelligence. First, LiDAR technology and Arc sag calculation models are used to design a 3D scene of transmission lines. In wind protection studies, a dynamic mechanical balance algorithm is used to simulate the wind deviation calibration of insulator strings in simple environments. For complex environments, a Particle Swarm Optimization-Convolutional Neural Network (PSO-CNN) model is used to predict wind deviation angles under various conditions, resulting in more accurate wind deviation risk assessment. This approach aims to reduce manual workload, lower industry design costs, and enhance the intelligence and standardization level of wind deviation risk assessment, providing strong support for the construction of smart grids.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the development of Vehicular Edge Computing (VEC) computing architectures, in the study of task offloading problem, based on the differences in task delay sensitivity and the dynamic characteristics of environmental information, this paper designs a distributed DRL framework based on non-cooperative game, and introduces a memory mechanism(RNN) and a shared experience mechanism (Shared Experience Actor-Critic) in the MADDPG algorithm, which improves the learning efficiency of the algorithm as well as the convergence speed by capturing and sharing the timeseries data across time steps.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the widespread application of domestic commercial cryptographic algorithms and the advancement of commercial cryptographic application evaluation, the compliance of these algorithms has garnered significant attention. Various security agencies and research institutions in China have initiated studies on the identification of commercial block cipher algorithms and have explored their application in cryptographic evaluation work. This paper focuses on extracting features from ciphertext using the NIST randomness test method, followed by training and testing these features through various machine learning and deep learning methods. The paper consolidates relevant domestic research on this topic. In the final part of the study, encrypted data from the COCO2014 dataset using the domestic commercial cryptographic algorithm SM4 and the AES128 (CBC, ECB) algorithms are used for algorithm identification, employing MLP, CNN, LSTM, and Attention mechanisms. The experimental results demonstrate that CNN exhibits higher accuracy and stability compared to existing solutions, while the Attention mechanism shows advantages in subsequent AES128-ECB identification, albeit with highly sensitive to variations in the key-dimension selection.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In order to improve the optimization effect of ecological restoration design scheme, genetic algorithm is used in this study and compared with simulated annealing and particle swarm optimization methods. By comparing the convergence speed and optimization effect of each method, the results show that the genetic algorithm is better than other methods in terms of convergence speed and optimization effect. The experimental data demonstrated the trend of the fitness of the genetic algorithm in multiple iterations, as well as the influence of different parameter settings on the optimization effect. The study verifies the effectiveness and superiority of genetic algorithm in ecological restoration design, which provides a powerful optimization tool and theoretical support for the solution of complex ecological problems.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Aiming at the problem that existing remote sensing recognition methods for the roof area of flat roofed houses are difficult to obtain accurate boundaries of the roof, a remote sensing recognition method for the roof area of flat roofed houses based on Gram Schmit algorithm is proposed. Perform geometric correction on the collected remote sensing images of the roofs of flat roofed houses using polynomial modeling methods; By using DeepLab semantic segmentation technology to extract the boundary feature parameters of the roof of a flat roofed house, and applying the Gram Schmit algorithm to further identify and outline the contour of the roof of the flat roofed house, the roof area can be calculated. The experimental results show that this method can effectively improve the accuracy and efficiency of identifying the roof area of flat roofed houses.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In order to make the scheduling of tasks to be executed by intelligent agents more flexible and reasonable, a method for adjusting intelligent scheduling tasks based on graph neural networks and adaptive weights is proposed. Based on the short-term memory ability of graph neural networks, search for temporary computing tasks received by intelligent agents in the database as the basic data for subsequent arrangement and adjustment. Extracting thematic content features of intelligent tasks based on recurrent neural networks; Identify priority task targets in the scheduled tasks through graph neural network models. Based on adaptive weights and combined with online learning listwise algorithm, intelligent task scheduling adjustment is achieved. The experimental results show that the conflict rate of task scheduling adjustment using this method remains stable within 5%, and the scheduling adjustment effect is more reasonable; The utilization rate of computing resources is higher after task scheduling adjustment, and task scheduling is more flexible; The longest response time for task scheduling adjustment is 29ms, which is more effective.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Due to the complex composition of the "Dual Carbon" digital intelligence monitoring data, the error in using it to analyze carbon emission status is large. Therefore, a multi-source heterogeneous "Dual Carbon" digital intelligence monitoring center based on time registration and federated learning is proposed. First, in terms of "double carbon" historical data accumulation, the composition of static basic data and carbon emission historical data is analyzed; in terms of "double carbon" real-time monitoring data, the composition of IoT sensing data and dynamic business data is analyzed. After adding timestamps to each data type, align the data to a unified timeline, build the "Double Carbon" digital intelligence monitoring center multi-source heterogeneous data high-order tensor in the federated learning framework, and perform Tucker decomposition and iterate to outputs the converged quadratic linear correlation function. In the test results, the method has the smallest difference between the analysis results of carbon emissions in region A and the actual situation. The corresponding error is always within 0.5*107 t, and the maximum error is only 0.46*107 t.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
For a long time, people have believed that representation problems are one of the bottlenecks in the field of machine learning. Therefore, it is a long-term and exploratory work to study machine learning representation methods. Due to this, we use category theory to study the fusion representation of data dimensionality reduction. We propose the basic concept of category representation for data dimensionality reduction and provide a data dimensionality reduction fusion representation framework. We have conducted research and analysis on algorithms such as PCA, KPCA, and LDA, identified the essential connection between PCA, KPCA, and LDA and proposed a data dimensionality reduction fusion representation algorithm based on a data dimensionality reduction fusion representation framework. Finally, we demonstrated the feasibility of the proposed method through experiments.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Computed Tomography (CT) stands as a preeminent medical imaging modality in computer-aided diagnosis, where accurate segmentation of lung parenchyma from CT slices can assist surgeons to diagnose lung diseases and improve the survival rate and prognostic conditions. This paper presents an automatic algorithm for the lung parenchyma segmentation based on superpixels and graph theory. The superpixels are obtained with simple linear iterative clustering method firstly. Then the conditional breadth first search algorithm is proposed to classify the target superpixels into lung parenchyma and non-lung parenchyma. With a contour refinement algorithm, we get the final segmentation results. Experimental evaluations reveal that the introduced method attains superior segmentation precision and accuracy, evidenced by an average Jaccard Index volume pixel overlap ratio of 96.53% across five distinct types of lung parenchyma image sequences.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Accurate and rapid prediction of building energy consumption is crucial for energy conservation. In this study, we introduce the M2SDF model to enhance the precision of building energy forecasts. First, we propose the MSTD module, which decomposes the data into multiple seasonal components and a trend component to capture complex patterns. Next, we introduce the MSF module, which integrates both macro and micro features. Finally, we obtain the prediction results by aggregating the output values from the predictors for each component. Experimental results demonstrate that our model surpasses other algorithms in both accuracy and stability.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Sulfur dioxide (SO2) is one of the most common components of atmospheric pollution, and its high concentration can cause harm to human beings as well as to the environment, and it is crucial to accurately predict the changes of sulfur dioxide content in the air. To this end, this paper first uses lasso regression and vector autoregressive model to do analysis of the factors affecting SO2 concentration, and then introduces the decomposability of NeuralProphet model for time-series data to decompose the SO2 concentration data into a trend term, a cyclic change term, and integrates it with other influencing factors to construct the Lightweight Gradient Boosting Machine (Lightgbm) model to obtain the SO2 concentration prediction value. The experiment is mainly analyzed by air pollution and meteorological history of Datong city, and compared with GA-BP and ARIMA, the results show that the R2 coefficient reaches 0.724, which is greatly improved compared with NeuralProphet, and the prediction accuracy of this model is better than that of ARIMA and GABP, and the factors affecting the concentration of SO2 are taken into account.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Games, recognized as the ninth art, have significantly evolved with advancements in technology, enhancing the realism of game experiences through virtual reality and lifelike animations. This paper focuses on designing a virtual reality game using VR technology and the Unity 3D engine. It specifically researches and implements an animation generation algorithm for the gait of a quadrupedal animal (mouse) from slow to fast. The game offers players an immersive experience, improving the fluidity and realism of animal gaits through an IK-based animation algorithm. Players can control a cartoon mouse character, Pichu, using the HTC Vive to walk, run, and crawl, gaining an intuitive understanding of quadrupedal movement patterns and biological characteristics.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The use of neural networks for identifying irregular 3D point cloud data is gaining increasing attention from researchers. Similar to CNN, in order to extract local features of point clouds, point cloud models typically construct neighborhoods, extract features from each neighborhood point, and finally use symmetric aggregation functions to capture local information. However, how to accurately utilizing spatial correlation in 3D space remains a challenging problem. To address this issue, we propose FRPoint, which transforms neighborhoods into frequency domain space and capture long and short distance dependencies in the frequency domain using a learnable weight matrices. Benefit to the frequency domain's ability to well describe the spatial relationships, our method efficiently extracts accurate spatial correlations. More importantly, our method only requires the introduction of a simple learnable matrix, without the need for any complex Attention operations or stacking of intricate feature modules. The effectiveness of our method has been demonstrated through experiments on the classification datasets ScanObjectNN and ModelNet40.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Diffusion models have recently proven successful in stochastic human motion prediction. Despite their excellent generative performance, they are difficult to predict in real-time because the multi-step sampling mechanism involves tens or hundreds of iterations of repetitive function evaluation. To address this issue, we introduce the Motion Consistency Model (MotionCM) to alleviate the computational and time consumption of the iterative inference process. It applies a diffusion pipeline to the low-frequency domain processed by discrete cosine transformation(DCT), alleviating the computational burden for each function evaluation. The predictive efficiency of the motion diffusion model in stochastic human motion is further improved by employing a one-step (or several-step) inference via maintaining consistency of the outputs over the trajectory of PF-ODE. Extensive experiments on two benchmark datasets show that the proposed model achieves state-of-the-art performance at less than 10% of the time cost.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we study the influence of hyperparameters of image generation model in Denoising Diffusion Probabilistic Model (DDPM) on image generation. Experiments were performed on MineRL dataset, the batch size, dimensions, learning rate, and sampling time steps of the model were adjusted, and the Fréchet Inception Distance (FID) was used to evaluate the quality of the resulting images. We introduce a performance degradation index to compare the effects of different hyperparameter settings. The experimental results show that dimension and learning rate are the key factors affecting the quality of DDPM image generation. These findings are of great significance for optimizing the DDPM model and improving its performance in image generation tasks.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Wind power output has strong randomness, volatility, and intermittency. To maintain the safety and stability of the large power grid under the new power system, high-precision medium-term wind power forecasting is urgently needed. This paper fully leverages the temporal dynamics of the wind power dataset and proposes a medium-term wind power ensemble forecasting model that integrates transfer entropy, improved complete ensemble empirical mode decomposition with adaptive noise (ICEEMDAN) decomposition, dual attention mechanism, and multiple recurrent neural networks. Firstly, we compare the transfer entropy of wind power and meteorological factors to determine the direction of information flow and select the set of characteristic variables. Next, utilizing the ICEEMDAN signal decomposition algorithm, the wind power sequence is segmented into various intrinsic mode functions, and attention-based LSTM, GRU, and BiLSTM models are established. After aggregation and reconstruction, three sets of predictions are obtained. Finally, the attention mechanism is combined to dynamically weight the three models to achieve ensemble predictions. Actual examples show that compared with several benchmark models, the proposed model notably enhances the predictive accuracy of medium-term wind power forecasting.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The satisfaction of furniture arrangement has a momentous influence in the comfort of people's being. At present, the layout of furniture is used by artificial interactive methods, which undoubtedly brings a lot of work to designers. Aiming at the problem of furniture layout, this paper proposes an automatic layout idea based on genetic algorithm. Galapagos genetic algorithm was used to design automatic layout experiment of living room furniture on grasshopper platform. Further set constraints on boundary and interference constraints, as well as objective functions. Finally, genetic algorithm is used to search for optimization, and the arranged plan pleasing restriction conditions is automatically generated.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the context of complex backgrounds in dense crowd images, small object detection in dense crowds from drone perspectives or surveillance videos faces numerous challenges. These challenges include small target sizes, dense distribution, low resolution, and susceptibility to background noise, making detection significantly difficult. To confront these problems, we implemented a range of enhancements and refinements to the original YOLOv8 model (subsequent articles refer to it as v8 model), proposing a new model named EBE-v8, which stands for ECA-BiFPN-EIoU-YOLOv8. First, the Efficient Channel Attention (ECA) mechanism was integrated into the v8 backbone by us. This operation enhances the model's focus on low-level features, further enhanced the detection capability for small objects. Second, we replaced PANet (the Path Aggregation Network) with a BiFPN (Weighted Bi-directional Feature Pyramid Network) structure to effectively integrate multi-level features, thus increasing the model's capacity to handle targets of different sizes. Finally, we employed a new IoU loss calculation method, EIoU, to better adapt to small object detection. This loss function can dynamically adjust the proportions of different parts of the loss function during various training stages, allowing small targets to better regress to the ground truth boxes and improving small object detection performance. Experimental results demonstrate that the model we enhanced significantly surpasses the initial v8-model in small object detection tasks in dense crowds. The test results on the identical datasets show that our model has achieved an accuracy of 86%, 72.8% recall, 81.6% mAP@0.5, meanwhile mAP@0.5:0.95 reached 53.5%. Compared to the initial v8-model, these results represent significant improvements, highlighting the efficacy and practicality of our approach.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
An accurate and fast method for calculating the engagement zone of close-range air combat plays a crucial auxiliary role in maneuver decision-making and situational assessment. The existing methods using neural network fitting have poor portability for objects with different flight characteristics, and have high requirements for parameters. Therefore, this article improves on the traditional golden section method. Firstly, a target infrared detectable zone model is established to achieve detection estimation. Secondly, a differential dynamic feedback structure is introduced to improve the golden section search algorithm, overcoming the sensitivity of the initial parameters and reducing the calculation cycle. Finally, by estimating the target's escape maneuver, fast and accurate calculation of the air combat engagement zone is achieved. The simulation results show that the infrared detection zone model can correct the air combat engagement zone, and the improved algorithm can adapt to the calculation of engagement zone with high dynamic characteristics.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, based on data mining technology, Fp-growth algorithm and Apriori algorithm are used to mine and analyze traffic accident features. The correlation between the causes and severity of traffic accidents is revealed through association rule mining.Apriori algorithm is less computationally efficient in mining frequent itemsets through the generation and pruning steps of candidate itemsets, though with high accuracy. In contrast, the Fp-growth algorithm avoids multiple database scans by constructing an Fp-tree, which saves memory resources and improves efficiency. The model construction and evaluation part uses a multi-type variable classification model to predict the accident severity level, and evaluates the model performance through a test set, which shows excellent performance in category 1. Ultimately, this study verifies the effectiveness of Fp-growth and Apriori algorithm in traffic accident factor mining and provides a decision support tool for traffic safety management.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.