Video can be regarded as a sequence composed of several related images. To predict video, we should not only grasp the characteristics of a single image, but also combine the temporal logic information between images. The Predictive Recurrent Neural Network (PredRNN) is a video frame prediction network using spatiotemporal memory stream structure in spatiotemporal LSTM network. This paper introduces the improved method of PredRNN based on feature fusion. The spatiotemporal memory flow structure of PredRNN will bring the problem of gradient disappearance with the increase of depth. This paper proposes to perform feature fusion on the spatiotemporal memory information and increase the gradient of deep network to improve the long-term video prediction effect of the network. Finally, the moving MNIST dataset and the KTH dataset are used to prove our network. The experimental results show that our method has a certain improvement over the PredRNN.
Flower classification is a fundamental work in the field of botany. Since flower images are fine-grained images with large intra-class differences and high inter-class similarity, it brings great challenges to their classification. With the rapid development of artificial intelligence technology, machine learning algorithms based on convolutional neural networks have begun to gradually replace manual methods for image recognition and classification tasks. When the traditional convolutional neural network model is applied to the task of flower image classification, the convolution with deep layers leads to the gradual weakening of spatial detail information, which makes it difficult for the feature map used to guide the classification results to fully express the fine-grained features of flowers. Therefore, the classification effect is not ideal. In order to improve the accuracy of flower image classification, a classification method based on the improved Inception V4 network is proposed. In the basic feature extraction stage, the shallow features are fused to obtain basic fusion features with more detailed spatial information. The base fusion features are then weighted by channel attention using multi-scale features. Finally, the weighted basic fusion features and the corresponding elements of the original multi-scale features are added and fused to form advanced fusion features for classification tasks. The experimental results show that the proposed improved Inception V4 network has a more ideal classification effect for flower images.
Sign language is the main way for the hearing-impaired people, a huge special group, to communicate with others in society. The use of new information technology in sign language recognition and translation is helpful for smooth communication between hearing impaired and healthy people. With the development of the Transformer network and attention mechanism in machine translation, the study has entered a new process. Aiming at the phenomenon of longer-term dependency, based on Transformer, we propose a continuous sign language translation model that incorporates the sequence relative position into the attention mechanism, replacing the original absolute position encoding. Combining with movement characteristics, we use image difference technology to dynamically calculate difference threshold and use image blur detection to adaptively extract key frames. Experimental results on RWTH-PHOENIX-Weather 2014T Dataset verify the effectiveness of the proposed model.
With the development of computer graphics and three-dimensional (3D) modeling technology, 3D model retrieval has been widely used in different applications, such as industrial design, virtual reality, medical diagnosis, etc. Massive data brings new opportunities and challenges to the development of the 3D model retrieval technology. However, with the emergence of complex models, traditional retrieval algorithms are not applicable to some extent. One important reason for this is that the traditional content-based retrieval methods do not take the spatial information of 3D models into account during feature extraction. Therefore, how to use the spatial information of a 3D model to obtain a more extensive feature has become a significant issue. In our proposed algorithm, we first normalize and voxelize the model, and then extract features from different views of the voxelized model. Secondly, deep features are extracted by using our proposed feature learning network. Then, a new feature weighting algorithm is applied to our 3D-view-based features, which can emphasize the more important views of the 3D models, so the retrieval performance can be improved. The experimental results on the standard 3D model dataset, Princeton ModelNet10, show that our model can achieve promising performance.
With the development of intelligent vehicle technology, the environment perception of intelligent vehicles is becoming more and more important. The detection of pavement roughness is an important component of environmental perception. In this paper, binocular stereo vision technology is used to detect the roughness of pavement on which vehicles drive, and a square segmentation method is used to calculate the value of the roughness. The binocular vision sensor is used to acquire the image and process it to get the depth image. Then square segmentation method is used to calculate the roughness of the pavement. By comparing scanning each point individually and calculating the difference of each point, square segmentation method will greatly reduce the calculating time. Through the algorithm verification of the actual pavement detection, the experiment shows the effectiveness of the method.
Edge computing is an extension of the cloud computing paradigm that shifts part of computing data, applications and services from the cloud server to the network edge, providing low-latency, mobility and location-aware support for delaysensitive applications. The elevators in the high-rise buildings are geographically distributed and movable. Safety and reliability of elevators have attracted people’s attention. Security problem in the elevator is a key issue, especially in emergencies requiring fast response and low latency. In this paper, an elevator abnormal behavior video surveillance system is designed and developed using edge computing paradigm. The recognition of abnormal image sequences and the evaluation of abnormal behavior are realized. Collecting, processing, and analyzing video images are completed at the network edge in real time. The Edge computing nodes are distributed and deployed according to the geographic location of the elevator. The edge nodes are based on mobile embedded devices, and use the computing resources of the embedded devices to implement edge computing at the network edge. Through the edge network, there are several edge nodes based clusters being built to perform distributed computation tasks.
Encoder-decoder framework attracts great interests in image caption. It focuses on the extraction of low-level features and achieves good results. The performance can be further improved if high-level semantics are considered. In this work, we propose a new image caption model incorporating high-level semantic features through an revised Convolutional Neural Network(CNN). Both the low-level image features and high-level semantic features are fed into the Long-Short Term Memory networks(LSTMs) to acquire natural sentence descriptions. We show in a number of experiments on Flickr8K and Flickr30K datasets that our method outperforms most standard network baseline for image caption.
KEYWORDS: Data modeling, Medical imaging, Breast cancer, Visual process modeling, Convolutional neural networks, Biomedical optics, Performance modeling, Image classification, Data acquisition
The development of convolutional neural network has brought great achievements to image classification in recent years. However, the classification performance is good only for natural images rather than medical images. An important reason is that the medical image database used for training is always deficient. So how to use these limited data to acquire more extensive features has become a hot research focus. In this paper, we first update the order and number of the whole training data every time in active and incremental fine-tuning. Then we set different contribution rate for the data selected in our model, which based on the information quantity of the data in training stage and make our model converge steadily. After that, a pre-trained model and our preprocessed datasets are employed, which allows us to further fine-tune our models. The experiments evaluated on two different biomedical datasets shows that our model can achieve promising results.
Convolutional neural networks in deep learning models have dominated the recent image recognition works. But the lack of capacity to maintain spatial invariance makes identification of micronucleus cells as a classic task in digital pathology still a challenge task. In this paper, a novel convolutional neural network for feature maps spatial transformation (FSTCNN) is proposed, which incorporates a Spatial Transformer Network. Our model allows the spatial manipulation of data within the network, provides the ability of active spatial transformation for neural network without any extra supervision. We compared the results of inserting STN into different convolutional layers and found that such a network can transform the input image more steadily, correct the image to one certain position, make it fill the whole screen to create a better environment for image recognition. The results show a distinct advantage over other convolutional neural networks for medical image recognition.
Due to the advantages of having large storage capacity and small code area, QR (quick response) codes have been widely used for automatic identification in many commercial applications such as parcel packaging, business cards and etc. The existing methods mainly focus on unambiguous QR code location with simple background, which always rely on the accomplishment of machine independently. While the QR code images with low quality and complex background always affect the accuracy and efficiency of location in automatic identification, especially the QR code images in which the finder patterns are destroyed. With the help of human, many interactive learning approaches can solve the problem of cognitive obstacles in computer operations. This paper focuses on locating blur QR codes with complex background by an efficient interactive two-stage framework. The first stage is rough location, which includes our interactive feature template setting and clustering process with our improved mean shift algorithm. Then we do the accurate location based on the optimization of the finder pattern detection. Experiments are performed on damaged, contaminated and scratched images with a complex background, which provide a quite promising result for QR code location.
Automatic image annotation is now a tough task in computer vision, the main sense of this tech is to deal with managing the massive image on the Internet and assisting intelligent retrieval. This paper designs a new image annotation model based on visual bag of words, using the low level features like color and texture information as well as mid-level feature as SIFT, and mixture the pic2pic, label2pic and label2label correlation to measure the correlation degree of labels and images. We aim to prune the specific features for each single label and formalize the annotation task as a learning process base on Positive-Negative Instances Learning. Experiments are performed using the Corel5K Dataset, and provide a quite promising result when comparing with other existing methods.
With the development of barcodes for commercial use, people’s requirements for detecting barcodes by smart phone become increasingly pressing. The low quality of barcode image captured by mobile phone always affects the decoding and recognition rates. This paper focuses on locating and decoding EAN-13 barcodes in fuzzy images. We present a more accurate locating algorithm based on segment length and high fault-tolerant rate algorithm for decoding barcodes. Unlike existing approaches, location algorithm is based on the edge segment length of EAN -13 barcodes, while our decoding algorithm allows the appearance of fuzzy region in barcode image. Experimental results are performed on damaged, contaminated and scratched digital images, and provide a quite promising result for EAN -13 barcode location and decoding.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.