PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This PDF file contains the front matter associated with SPIE Proceedings Volume 12508, including the Title Page, Copyright information, Table of Contents, and Conference Committee list.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
International Symposium on Artificial Intelligence and Robotics 2022
In recent years, the elderly population in Japan has been increasing. Expectations for the utilization of welfare equipment are also increasing. Electric wheelchairs are one of equipment and are widely used as a convenient means of transportation. On the other hand, accidents have also occurred, and dangers have been pointed out when driving the electric wheelchair. Therefore, we believe that the development of an autonomous mobile electric wheelchair can improve the causes of accidents. In addition, it can be expected to reduce accidents and improve the convenience of electric wheelchairs. For the development of an autonomous electric wheelchair, environment recognition such as estimation of the current position, recognition of sidewalks and traffic lights, and prediction of movement of objects is indispensable. To solve these problems, we develop an algorithm to recognize the sidewalks, crosswalks, and traffic lights from video images. In recent years, deep learning has been widely applied in the field of image recognition. Therefore, we improve WideSeg, one of the semantic segmentation algorithms that apply CNN (Convolutional Neural Networks), and develop an object recognition method using a new CNN model. In our approach, we perform adding the sidewalk correction and noise removal processing after performing semantic segmentation with the proposed model.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Audio watermarking is an effective scheme for copyright protection. A robust dual watermarking scheme based on melody feature is proposed in this paper. Melody feature can be encoded for integrity verification. And then two robust watermark methods based on discrete cosine transform(DCT) and histogram are adopted to embed the watermark into the audio, which is used to resist different attacks. Tamper can be detected by comparing the melody features in the audio. In the extraction process, two groups of watermarks are extracted, and two images can be obtained. The Brenner gradient values of these two images are calculated, and the image with smaller Brenner value is selected as the final watermark image. The experimental results show that the proposed watermarking scheme has robustness to common signal processing and synchronization attacks. The robustness of the proposed watermark is better than existing algorithms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In recently years, due to its power global search ability, artificial bee colony (ABC) has been successfully applied in many real-world problems. But its shortcoming of slow convergence speed still constraints the further applications. In this paper, for further enhancing its merits and conquering this shortcoming, we propose three improved strategies into ABC algorithm. First, we introduce a new array to preserve some elites of population ever achieved. Based on this, the new updating equations are proposed in our paper. Finally, a new updating mechanism for scout bee is proposed to learn from the defined array for further accelerating the convergence rate of population. Compared with the compared modern evolutionary algorithms, experimental results verify our proposed algorithm achieve better performance especially on the accuracy and stability of solutions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Visual-linguistic interaction is a problem of addressing the obstacles of Information Deficiency and Weight Deviation in text classification. Information deficiency usually occurs in vision-dominated or modality-balanced tasks, and many multimodal fusion approaches have been proposed (e.g. Gated-based and Contextualized-based methods). However, there is still no remarkable solution to the weight deviation of irrelevant descriptions in textdominated tasks. To solve it, we introduce a novel Quadruplet Attention to adjust the textual-visual weight distribution, where visual-linguistic information interacts with each other and dot product can be represented as a 2 × 2 matrix, referred to as quadruplet. Then, a Multimodal Architecture is further proposed to enhance text-dominated classification. Extensive experiments on Daily Mail have proved the effectiveness of our method, which achieves significant improvements of 1.42 and 3.4 respectively in ROUGE-L F1 and Marco F1.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Federated learning has received extensive attention as a new distributed learning framework, which enables joint modeling without data sharing. However, it is still affected by the communication bottleneck, and most clients may not be able to participate in the joint learning modeling at the same time, resulting in slow convergence. To solve the above problems, we propose a federated learning aggregation algorithm based on a global perspective, which considers the data distribution of participating clients. The server builds a feature distribution table according to the data distribution, and each time the server selects a set of clients for training, it will cover more features to a greater extent to learn the global data more fully. Specifically, the selection of these clients is not random. When the server selects, it will construct a set of clients with the largest mutual distribution difference within the range of visible clients, and place it at the end of the selected chain after each training until all clients’ end is selected. We demonstrate the effectiveness of our work through comprehensive experiments and comparisons between the two most popular algorithms. Specifically, our algorithm achieves an average speedup of 40% compared to traditional algorithms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A pseudonym is the basic method of intelligent transportation system privacy protection. Using replaceable multi-pseudonym scheme can better protect the location privacy of vehicles. At present, pseudonym supply methods mainly include preloading and on-demand. The validity period of preloading scheme is fixed, the replacement is limited, the storage overhead is huge and the pseudonym waste is serious. The on-demand scheme needs roadside unit or pseudonym certificate authority assistance, with large delay, hidden dangers of Sybil attack and other problems. This paper proposes a massive pseudonym management scheme based on certificates signature and pseudonym self-agent generated by the vehicle. The scheme meets all the requirements of conditional pseudonym authentication. Through the self-agent generation of short-term pseudonyms, the system can realize the efficient management of the pseudonyms of millions of vehicles. The supply of pseudonym needs to be authorized by the pseudonym management authority, avoiding the abuse of pseudonym and Sybil attack. In this scheme, the pseudonym log server is used to provide the transparency of the management. For pseudonym revocation, we adopt the whitelist-like method without using the traditional certificate revocation list. We have carried out security analysis and performance evaluation of this scheme, and found that this scheme has good scalability and supports the transparency of management.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Obtaining medical pathways from a large number of medical logs has become a current research hotspot. In this article, we proposed a method that combines trace clustering, process discovery and neural network to discover medical pathway models from complex medical logs. The source medical logs were structured as XES event logs first. Cases with similar medical behavior were aggregated by trace clustering. Use process mining to generate process models. Extract reasonable medical pathways from the process models. Neural network was used to determine the proportional characteristics of medical pathways. Combine the above to form a usable medical pathway model. The results of the experiments show that the average simplicity of the generated process model is 0.695, the average accuracy of the neural network models is 93.44%, and the medical pathway model score is about 0.879.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Artificial intelligence has achieved a breakthrough with the proposal and development of deep learning. Compared with traditional models, deep learning allows machines to extract features and train neural networks by learning weight parameters. Convolutional Neural Networks (CNN), as the top priority of deep learning, have achieved remarkable results in 2D image recognition and classification segmentation. Recently, points cloud is a recent hot 3D data form in the field of deep learning. Point clouds retain better spatial geometric information than other forms of 3D data such as mesh depth maps. Due to the disorder, rotation invariance, the uneven density distribution of 3D point clouds, high sensor noise, and complex scenes, deep learning of 3D point clouds is still in the initial stage, and there are significant challenges. The tasks of deep learning for point clouds are mainly classified into shape classification, instance segmentation, semantic segmentation, etc. This article specifically outlines the development of methods for shape classification tasks and the characteristics and differences of each method. In addition, a comparison of the training accuracy and efficiency of each method on the dataset is provided.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Thermal defect detection aims to identify overheated areas of electric accessory with the help of infrared imaging technology. In this paper, we propose a thermal defect segmentation method based on saliency constraint. Specifically, we first design a convolutional neural network for infrared image classification, the thermal ones of which are then denoised and enhanced by image preprocessing; Next, the modified K-means clustering algorithm is utilized for region segmentation, which splinters infrared images as environment area, normal area and thermal area; Finally, we perform saliency detection on infrared images to obtain approximate region of temperature anomaly, and the overheated area is likewise segmented based on the modified K-means clustering algorithm, which is subsequently used to revise the thermal area segmented based on enhanced images to satisfy saliency constraint. Experimental results suggest that our method can improve the diagnostic efficiency of infrared images and realize the precise positioning of thermal defects, which outperforms the state-of-the-arts.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
To improve the control accuracy of flexible exoskeleton of lower limb under external disturbance and parameter uncertainty, a compound position control method is designed. Firstly, the Lagrange function is used to establish the model of exoskeleton robot . Secondly, two finite time state observers (FTSO) are utilized to observe disturbances that will be compensated, because the system is impacted by both matched and unmatched disturbances. In the controller, the super twisting algorithm is used and to guarantee that the knee joint's trajectory tracking error will converge. Finally, the Lyapunov function is set up to show that the controller is stable. It can be seen from the experiment that the suggested control method is better than the conventional SMC because it has a more precise trajectory tracking effect and is more stable.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In recent years, industrialization and economic development in countries around the world have led to an ever-increasing demand for energy. Renewable energies are attracting attention, but they still often use mineral resources such as coal, petroleum, and natural gas, and onshore resources are depleting day by day. These energy and metal resources, such as copper, support Japan's industries and affluent lifestyle, and if Japan continues to rely on imports for most of these resources, it will become difficult for Japan to secure a stable supply of these energies and resources. Therefore, mining of mineral resources on the seafloor is essential to solve these problems, and research on seafloor resource surveys and mining is underway. Because direct human exploration and mining of seafloor resources are naturally dangerous, underwater robots are used to explore and mine seafloor resources. However, due to light absorption and turbidity in water, the underwater image of an underwater robot is sometimes less visible, making exploration unsatisfactory. Therefore, there is a need for higher-resolution underwater images of underwater robots. In this study, we perform super-resolution of underwater images using an improved SRCNN to support research on underwater images of underwater robots. The conventional SRCNN method uses the ReLU function as the activation function, but the improved SRCNN uses the PReLU function and FReLU function, which are extended activation functions of the ReLU function, to improve accuracy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
IoT technology has made remarkable progress in recent years, and the world is full of IoT devices that continue to evolve every day. From smartphones, personal computers, and smartwatches to home appliances such as refrigerators and washing machines, and even indoor lights and house keys, IoT devices have become an inseparable part of our lives. In addition to devices used by individuals, IoT technology supports our daily lives from both front and back sides, such as IoT-enabled industrial equipment and satellite positioning systems. Japan has been making a national push to shift to IoT in industries that reduce the burden on workers and have recently been promoting a plan called Smart Agriculture, Forestry, and Fisheries. Among these three types of industries, the agricultural sector is slightly ahead of the others, with the Smart Agriculture Demonstration Project starting in 2019, and 182 districts in Japan are implementing the project by FY2021. The forestry and fisheries industries are also developing daily to become next-generation industries based on the program established in December 2019, although they are behind agriculture. However, the examples mentioned so far are those that have been promoted with the help of companies and the national government, although everyone has benefited from them. In this paper, we propose an IoT device that can be used as an IoT buoy by using a microcomputer, Raspberry Pi, to create a camera device that can distribute underwater images in real-time and track its location.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
For MRI images of pelvis, it is helpful for doctors to extract the structure of pelvis quickly and accurately. Then the disease in the pelvic area can be diagnosed and analyzed in time. Extracting skeletal contour from MRI images of pelvis is not only time consuming but also low precice. Therefore, this paper proposes an improved image segmentation algorithm based on MultiR2UNet. We adopted R2UNet, which is more accurate in the segmentation field, as the backbone network. The residual connection is used in the network hopping layer, and the MultiRes Block is used in the up-sampling, which is beneficial to increase the depth of the network and to extract more detailed features. Due to the small number of pelvis training samples and the imbalance of samples, we performed data enhancement in the data preprocessing stage. The data samples were effectively amplified. In the training phase, we propose to use the mixed loss function. After several times of training and detection, the gap between the pelvis section segmentation by the algorithm in this paper and the real label is fairly small, and their coincidence degree can reach about 91%. The average segmentation time for each image was about 0.012s. The experimental results show that the proposed algorithm can guarantee the segmentation accuracy. MultiR2UNet is an effective real-time pelvis segmentation algorithm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A good image retargeting method can retain the important information of an image while changing its size. Image retargeting has been widely used in multi-size device displays and software thumbnail images. The existing image retargeting methods have some defects when they are used to process important regions of a large area and linear elements in the image. In this paper, an improved Seam Carving method is developed through optimizing the saliency map determination and operation flow. The saliency map is determined by Canny edge detection with adaptive threshold, Hough transforms for detecting straight lines, Yolo neural network and flood fill for sensitive area detection, etc. With these methods, the expression of essential information in pictures is improved. In the algorithm running process, the SC algorithm based on average energy and the similar simulated annealing algorithm based on seam neighborhood penalty are used to improve the running process of Seam Carving. Finally, experimental results on the open-source data set RetargetMe indicate that the proposed method achieves better performance in comparison with Seam-carving (SC), Shift-map (SM), Scale-and-stretch (SNS), and Warping methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The diverse and dynamic working environment of a wind turbine (WT) frequently makes it difficult to monitor and identify abnormalities. In this study, a novel approach is proposed for abnormal recognition of WT generator, in which the convolutional neural network (CNN) cascades to the long and short term memory network (LSTM) based on nuclear principal component analysis (KPCA). Firstly, the quartile method is used to preprocess SCADA data to delete abnormal data and improve data effectiveness. Then, by selecting the input variables based on Pearson correlation coefficient, KPCA can eliminate the nonlinearity of process variables and enhance the generalization ability of the algorithm. In this study, CNN and LSTM based on KPCA state recognition model is established by extracting principal com-ponents from KPCA. The model can warn the abnormal state of the generator through the prediction residual. The prediction residual exceeds the threshold for many times, indicating that the operation state is abnormal. Finally, to demonstrate the effectiveness of this approach, the state of WTs generator is forecasted using examples.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Small object detection has been a difficult task because of small area, low resolution, few available features and many other problems. In order to improve the performance of small object detection, a classical augmentation method, which copies and pastes small objects to the image, is usually adopted. However, in some specific scenes, small objects cannot be pasted completely randomly on the picture without any area restriction. In this paper, to solve the task of small object detection in specific scenes, on the basis of the copy-paste augmentation method, we further design three strategies to restrict the paste position of the copied object to the target area of the image. In this way, the augmented image is more realistic to the scenario, which can improve the performance of small object detection. We conduct experiments on different object detection methods, and validate that in contrast to two-stage object detection methods, our copy-and-restricted-paste augmentation strategy is more suitable for one-stage object detection methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Inspection for the electric transmission system has great significance for powerline maintenance, in which defects of insulators are needed to be found in time to preserve the safety of the whole system. To improve the accuracy and efficiency of insulator defect detection, computer vision techniques are employed. However, since insulator defects on the insulator strings are small objects and usually works in complex environment, it is challenging to get satisfactory detection results. In order to solve this issue, we proposed an insulator defect detection method based on YOLOv7 which is one of the state-of-the-art object detection methods. By introducing coordinate attention mechanism into the backbone network and redesigning the feature pyramid network (FPN) to have bi-directional FPN like structure, we successfully adapt the original model to the insulator defect detection task. We used an open-source dataset called CPLID to train our model. Experiments demonstrate that our method achieve good performance for insulator defect detection and have better average precision comparing with other methods. Ablation study were also designed to verify the effectiveness of the improved component.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the sphere of financial investment, predicting future trends of stock market indexes using historical transaction data is a critical topic. As the complexity and extreme volatility of the stock market, precisely predicting the trajectory of the indexes is challenging. Aiming at the volatility of short-term index prediction tasks, a long-term prediction method termed One-Covn is proposed. Specifically, this method takes the mean scale of rise and fall in further few days instead of the following one day as the prediction label. First, a data normalization method is proposed, in which the historical transaction data are transformed to the scale of rise and fall. Then, a one-day step sliding window is applied to split the sequence data to prediction samples and the corresponding labels are obtained at the same time. Finally, the one-dimensional convolutional network is utilized to extract the sample deep features and also map the feature to the prediction label. To evaluate the algorithm’s performance, 42 Chinese stock market indices were chosen as experimental data, the mean absolute error (MAE) and mean square error (MSE) were utilized as training loss functions. Classic approaches including ANN, LSTM, CNN LSTM were chosen as comparison benchmarks. The results show that the method can effectively reduce the average prediction error.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The robot market in Japan is gradually expanding due to increasing demand. Industrial robots are being actively introduced in the manufacturing industry. The introduction of robots in the industry has three advantages: 1) securing labor, 2) increasing productivity, and 3) improving quality. The robot can run for a long time with constant work efficiency, thus achieving stable production. In addition, by replacing human labor, robots can reduce labor costs and reduce human error. The downside of introducing a robot is that the robot has to be told where to grab, which takes time, and a technician with specialized knowledge. Furthermore, this method cannot perform grasping when the grasping object is not in the specified position. However, the introduction of robot vision may solve these problems. In this study, by using depth images, processed images, and deep learning models, we aim to achieve object color independent high-accuracy grasp position estimation for thin objects. We sharpen the depth image mainly by applying grayscale transformation and modify the deep learning model. The experimental results show that our design can achieve good results.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Stock price prediction is a hot topic and has attracted the sufficient attention of both regulatory authorities and financial institutions. Because the fluctuation of stock prices is the result of many different factors, it is not easy to make stock price prediction. Traditional prediction solutions are mainly using simple linear models based on statistical and econometric models, these solutions are difficult to support nonstationary time series data. With the development of deep learning, some newly models can not only support non-linear data, but also retain useful information for better forecasting the stock prices. This paper aims to construct a CNN-GRU-Attention based model for price prediction in Chinese stock markets. First, the convolutional and pooling layers of CNN are used to extract features of factor correlation information from the input data; then, the output of feature matrix is used as input for the GRU model to forecast correlation; finally, the Attention mechanism is used to focus on the important characteristics of stock prices and optimize model structure. We collect multi-dimensional stock data of the China SSE 50 index from 2011 to 2021 as our dataset and conduct a set of experiments to compare the performance, which measured in terms of their Mean Absolute Percentage Error (MAPE), Root Mean Squared Error (RMSE) and R squared (R²) score. The proposed model is superior to other models: MAPE decreased by 11.23%, RMSE decreased by 5.71% and R²score improved by 0.41%, which shows that the CNN-GRU-Attention model outperforms state-of-the-art approaches in forecasting stock price.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
At present, industrial robots are required to support high-mix low-volume production, which calls for automation and flexibility in production lines. To realize these requirements, it is necessary that automate of bin picking. In this work, we propose a three-dimensional object recognition using deep learning to automate coordination-less bin picking. The deep learning-based method requires the use of training data, but it has the higher cost of annotating the training data. Therefore, we construct a coordination-less recognition model by using the automatic acquisition method for training data in a simulation environment. For the evaluation, we conducted experiments in a simulation and real environment to verify the accuracy of the proposed method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the advancement of robotics, intelligent robots are widely used in substation inspections. In view of the problem that the parameters of the deep learning model are too large, and the performance of embedded devices is limited, this paper proposes a meter detection and recognition method based on a lightweight deep learning model, which provides support for deploying the model to the substation intelligent inspection robot. First perform target detection on the input image to detect the position frame of the dashboard; then extract the target area, perform semantic segmentation in the target area, segment the mask of the pointer and scale, and convert the mask into two-dimensional by scanning the image is converted into a one-dimensional array, and the position of the pointer and scale is predicted through peak detection, and finally the scale is calculated according to the scale and range. The invention applies the lightweight method of runing and knowledge distillation based on the YOLOv7-tiny model in the target detection stage, so that the model is greatly compressed while maintaining the prediction accuracy; in the semantic segmentation stage, a lightweight method based on depth-wise separable convolution is used. The lightweight U2NetP model replaces the U2Net model, which greatly reduces the amount of model parameters. The experimental results show that the lightweight method used in this paper can compress the original YOLOv7-tiny model by 95.7%, the average accuracy rate can reach 90.5%, the original U2NetP model can be compressed by 76.8%, the average IOU can reach 88.7%, and the average pixel accuracy rate can reach 99.4%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Video stabilization is a technology used to improve video stabilization for higher video quality. The traditional video anti-shake technology adopts the 2D optical flow estimation method, but it will produce great errors in the face of complex spatial scenes. In recent years, with the deepening of deep learning in the field of video processing, video stabilization has achieved unprecedented results, and some works are also quite subtle for 3D spatial processing. However, they either fail to account for dynamic parts of the scene, or cause frame clipping. Considering the background of related fields, this article reviews the video anti-shake methods in the past years, puts forward optimization thinking on existing problems, and finally gives an outlook.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Meal assistance robots have been developed because people with upper limb disabilities have difficulty in eating by themselves. We develop a robot to automatically select and assist food by machine learning to operate more easily. This machine learning requires the creation of high-quality datasets for each type of food. In this paper, we propose the automatic improved method by using Density-Based Spatial Clustering of Applications with Noise repetitively to remove noisy images in the dataset. Experimental results show that the percentage of noise images in the dataset was reduced by 20%. In this way, we hope that the accuracy of automatic selection implemented in meal assistance robot is improved.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Smart education is a product of the information-based education development, which enhances the intelligence of traditional education and realizes innovative education. According to the personalized learning characteristics of students, with a student-centered approach and relying on Online-to-offline hybrid courses, we have designed a new architecture for recommending educational resources to improve learning effectiveness and promote effective teaching by tapping into students' potential. The use of sequential learning technology for teaching resource recommendation is a popular research direction in intelligent education, and its core is the recommendation algorithm of personalized resources. In order to solve the problem of insufficient location information and low accuracy in the result table based on sorting learning, a recommendation algorithm of interest points based on ListMLE is proposed. Firstly, the ListMLE algorithm is applied to interest point recommendation based on the attention difference of interest point location in the recommendation list. Secondly, the influence of users' social relations is incorporated into the scoring function of ListMLE. Finally, a cost-sensitive method is introduced in the recommendation list calculation process. This paper proposes an online education resource recommendation method for personalized learning. Experimental results show that the algorithm outperforms the baseline ranking learning algorithm in terms of accuracy and recall. The method can be used to study students' learning behaviors and provide a theoretical basis for designing personalized learning programs based on students' learning status.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the three-dimensional reconstruction of unmanned aerial vehicle (UAV) oblique photography, the variation of illumination and viewing angle will lead to the instability of interest points extraction. The wide-baseline images are invalidated by neighborhood cross-correlation methods. Based on the analysis of continuous closed-loop image data, a feature tracking and matching algorithm of track closed loop sequence wide-baseline image is introduced in this paper. Firstly, Interest points of each image are extracted by SuperPoint algorithm, and the continuous pairwise matching is carried out by SuperGlue algorithm; then, the matching results are used for feature tracking in both positive and negative directions, and the feature tracking results of the two directions are fused; finally, DEGENSAC is used to filter outliers, so as to obtain the optimal matching result. The experimental results show that, t for wide-baseline image data, the matching points obtained by this algorithm are more uniform than those obtained by ASIFT +FLANN algorithm, and more feature points can be matched than those obtained by SuperPoint+SuperGlue algorithm based on machine learning, and this algorithm is more robust in wide-baseline feature matching.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the widespread use of LiDAR sensors, 3D object detection through 3D point cloud data processing has become a research target in robotics and autonomous driving. However, the disorder and sparsity of point cloud data are the problems in traditional point cloud data processing. It is challenging to detect objects using a large amount of point cloud data. Conventional 3D object detectors have mainly grid-based methods and point-based methods. PV-RCNN proposed a framework that combines voxel-based and point-based techniques, and object features are extracted using 3D voxel CNNs. However, the resolution reduction caused by the CNN affects the localization of objects. This study aims to improve the detection accuracy of more minor things by feeding not only a single output of the voxel CNN but also multiple outputs, including high-resolution outputs, to the RPN. We came out with a new network that introduces the Multi-Scale Region Proposal Network to reduce the effect of resolution degradation. Our network has better recognition accuracy for small objects like bicycles than the original PV-RCNN. In extensive experiments, we demonstrate that our model has a 5% improvement for small things, such as cyclists training on the KITTI dataset.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.