Cross-age face generation refers to generating face images of other age groups by using images of known ages. It is widely used in public safety, entertainment, etc. As to the problem that the existing methods based on GANs only use age information as the generation condition and ignore the sequence of age information, we present a cross-age face generation method based on CGAN and LSTM. This method consists of four modules. The first module is a generator, which is used to generate face images of different age groups. The second module is a discriminator, whose main task is to determine whether the generated image is real or forged. The third module is a pre-trained ResNet, which is responsible for extracting the features of real images. Finally, LSTM provides age groups classification constraints for the generator by the sequence of age information.
With the continuous development of artificial intelligence, face recognition is widely used in identity authentication, facial recognition payment, intelligent security and other fields. However, the existence of adversarial samples has great security risks to face recognition. In the face recognition system, the attacker can make the system mistake the identity by adding tiny changes to the face image, thus causing a series of security threats such as system intrusion, illegal access to authority, stolen property, and evasion of legal responsibility. In this paper, we first introduce the basic concept of adversarial attacks, and briefly analyze the typical adversarial sample generation methods in recent years. Then, the adversarial attack security problem of face recognition is discussed. Finally, the research status of face recognition adversarial attacks is analyzed.
A robust algorithm is proposed for real-time abnormal behavior recognition in dynamic challenge including illumination change and pose variations. To cope with these factors, we present a new method through segmented sampling processing evenly divides the original video into several fragments and according to a certain sampling density, the description of one I frame and the description information of several P frames are obtained from each video clip. Through the Res2Net18 network, a behavior classifier based on the cumulative motion vector of P frames and a behavior classifier based on the cumulative residuals of P frames are built. The various frame information extracted from each video clip is entered into the corresponding network, and each network outputs a classification score. According to the type of frame information entered, the classification score based on each type of input in all video clips is summed and averaged, and the classification score based on each type of input is obtained at the original video level. The ensemble is carried out by means of weighted summation, and the total classification score is obtained as the output of the abnormal behavior recognition network. Experimental results on benchmark datasets demonstrate that the proposed method performs robustly and favorably.
Violence detection from surveillance video is a challenging and attractive task. This paper introduce a new violence detection using binocular stereo vision. We use the sparse stereo matching method to extract the feature points of both rectified images and obtain the vision disparity of the point. The 3D coordinates of the points are calculated through the standard 3D measurement theory. To describe the spatio-temporal property, we extract features aligned with the trajectories to characterize depth information (Three-dimensional motion vector), appearance (histograms of oriented gradients) and motion (histograms of optical flow). In order to obtain the discriminative feature, this paper adopts sparse coding scheme and support vector machine (SVM) to classify the feature vector as normal or abnormal.
KEYWORDS: Cameras, Video, RGB color model, 3D acquisition, 3D modeling, 3D metrology, Detection and tracking algorithms, Visual process modeling, Motion models, Spherical lenses
Violence detection in videos is a challenging task which has gotten much attention in the research community. In this paper, we propose a three-stream network framework for violence detection in binocular stereo vision. To capture the complementary information from the video we adopt the appearance, motion and depth information. The spatial part, we use the RGB as the individual frame appearance. Then, we use the sparse stereo matching method to extract the feature points and obtain the vision disparity of the point. The 3D coordinates of the points are calculated through the standard 3D measurement theory. The 3D motion vector conveys the movement of the camera and the objects as the motion information. Besides, the depth information flow is the third input of the network which can improved recognition rate.
Deep learning has strong abilities in finding and expressing characteristics of pictures. Recent years, with the arrival of big data era and the development of computers, deep learning has made great breakthroughs and become the focus of the field of computer vision. First the history and classification of deep learning are presented. This thesis also introduces the basic theory of typical deep learning models on computer vision, which include convolutional neural network, recurrent neural network and generative adversarial network. And then summarizing the research situations and progress of deep learning on image classification, image detection, image segmentation as well as video recognition and prediction. Finally, the development and trend of deep learning in the field of computer vision are analyzed. The combination of convolutional neural network and recurrent neural network will be a good choice for video recognition and prediction, which still has a big gap between human beings cognition. And it is the generative adversarial network which has strong ability to generate new samples based on the potential distribution will play an important role in computer vision.
A robust algorithm is proposed for tracking object in dynamic challenges including illumination change, pose variation, and occlusion in stationary scene. To cope with these factors, the Spatio-Temporal Context learning based on Multifeature (MSTC) is integrated within a fusion framework. Different from the original Spatio-Temporal Context learning (STC) algorithm which exploits the low-level features (i.e. image intensity and position) from the target and its surrounding regions, our approach utilize the high-level features like Histogram of Oriented Gradient (HOG) and low-level features for tracker interaction and selection for robust tracking performance in decision level. Experimental results on benchmark datasets demonstrate that the proposed algorithm performs robustly and favorably against the original algorithm.
Violent detection from video is a hot topic which has wide application. The aim of this paper is to design a novel feature descriptors called motion gradient location and orientation histogram (MoGLOH), which encode not only the local appearance but also explicitly models local motion. Our proposed MoGLOH is composed of two part of information. The first part is the gradient location and orientation histogram (GLOH) describing the spatial appearance, and the second part is an aggregated histogram of optical flow with a log-polar location grid named Optical Flow Orientation Histogram (OFOH) which indicate the movement of feature point. To eliminate the feature noise, the non-parametric Kernel Density Estimation (KDE) is employed on the MoGLOH descriptor. The theoretical analysis demonstrates the proposed algorithm performs robustly and favorably.
In this work, the photoresponse and photo-induced memory effect were demonstrated in an organic field-effect transistor (OFET) with semiconductor pentacene and SiO2 as the active and gate dielectric layers, respectively. By inserting AlOX nanoparticles (NPs) at the interface of pentacene/SiO2, obvious enhancing photoresponse was obtained in the OFET with the maximum responsivity and photosensitivity of about 15 A/W and 100, respectively. Moreover, the stable photoinduced memory effect was achieved in the OFET, attributing to the photogenerated electrons captured by the interface traps of the AlOX NPs/SiO2.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.