Computer vision (CV) algorithms have improved tremendously with the application of neural network-based approaches. For instance, Convolutional Neural Networks (CNNs) achieve state of the art performance on Infrared (IR) detection and identification (e.g., classification) problems. To train such algorithms, however, requires a tremendous quantity of labeled data, which are less available in the IR domain than for “natural imagery”, and are further less available for CV-related tasks. Recent work has demonstrated that synthetic data generation techniques provide a cheap and attractive alternative to collecting real data, despite a “realism gap” that exists between synthetic and real IR data.
In this work, we train deep models on a combination of real and synthetic IR data, and we evaluate model performance on real IR data. We focus on the tasks of vehicle and person detection, object identification, and vehicle parts segmentation. We find that for both detection and object identification, training on a combination of real and synthetic data performs better than training only on real data. This classification improvement demonstrates an advantage to using synthetic data for computer vision. Furthermore, we believe that the utility of synthetic data – when combined with real data – will only increase as the realism gap closes.
In order to achieve state of the art classification and detection performance with modern deep learning approaches, large amounts of labeled data are required. In the infrared (IR) domain, the required quantity of data can be prohibitively expensive and time-consuming to acquire. This makes the generation of synthetic data an attractive alternative. The well-known Unreal Engine (UE) software enables multispectral simulation addon packages to obtain a degree of physical realism, providing a possible avenue for generating such data. However, significant technical challenges remain to design a synthetic IR dataset—varying class, position, object size, and many other factors is critical to achieving a training dataset useful for object detection and classification. In this work we explore these critical axes of variation using standard CNN architectures, evaluating a large UE training set on a real IR validation set, and provide guidelines for variation in many of these critical dimensions for multiple machine learning problems.
Achieving state of the art performance with CNNs (Convolutional Neural Networks) on IR (infrared) detection and classification problems requires significant quantities of labeled training data. Real data in this domain can be both expensive and time-consuming to acquire. Synthetic data generation techniques have made significant gains in efficiency and realism in recent work, and provide an attractive and much cheaper alternative to collecting real data. However, the salient differences between synthetic and real IR data still constitute a “realism gap”, meaning that synthetic data is not as effective for training CNNs as real data. In this work we explore the use of image compositing techniques to combine real and synthetic IR data, improving realism while retaining many of the efficiency benefits of the synthetic data approach. In addition, we demonstrate the importance of controlling the object size distribution (in pixels) of synthetic IR training sets. By evaluating synthetically-trained models on real IR data, we show notable improvement over previous synthetic IR data approaches and suggest guidelines for enhanced performance with future training dataset generation.
Recently, progress has been made in the supervised training of Convolutional Object Detectors (e.g. Faster RCNN) for threat recognition in carry-on luggage using X-ray images. This is part of the Transportation Security Administration's (TSA's) mission to ensure safety for air travelers in the United States. Collecting more data reliably improves performance for this class of deep algorithm, but requires time and money to produce training data with threats staged in realistic contexts. In contrast to these hand-collected data containing threats, data from the real-world, known as the Stream-of-Commerce (SOC), can be collected quickly with minimal cost; while technically unlabeled, in this work we make a practical assumption that these are without threat objects. Because of these data constraints, we will use both labeled and unlabeled sources of data for the automatic threat recognition problem. In this paper, we present a semi-supervised approach for this problem which we call Background Adaptive Faster R-CNN. This approach is a training method for two-stage object detectors which uses Domain Adaptation methods from the field of deep learning. The data sources described earlier are considered two “domains": one a hand-collected data domain of images with threats, and the other a real-world domain of images assumed without threats. Two domain discriminators, one for discriminating object proposals and one for image features, are adversarially trained to prevent encoding domain-specific information. Penalizing this encoding is important because otherwise the Convolutional Neural Network (CNN) can learn to distinguish images from the two sources based on superficial characteristics, and minimize a purely supervised loss function without improving its ability to recognize objects. For the hand-collected data, only object proposals and image features completely outside of areas corresponding to ground truth object bounding boxes (background) are used. The losses for these domain-adaptive discriminators are added to the Faster R-CNN losses of images from both domains. This technique enables threat recognition based on examples from the labeled data, and can reduce false alarm rates by matching the statistics of extracted features on the hand-collected backgrounds to that of the real world data. Performance improvements are demonstrated on two independently-collected datasets of labeled threats.
The Transportation Security Administration safeguards all United States air travel. To do so, they employ human inspectors to screen x-ray images of carry-on baggage for threats and other prohibited items, which can be challenging. On the other hand, recent research applying deep learning techniques to computer-aided security screening to assist operators has yielded encouraging results. Deep learning is a subfield of machine learning based on learning abstractions from data, as opposed to engineering features by hand. These techniques have proven to be quite effective in many domains, including computer vision, natural language processing, speech recognition, self-driving cars, and geographical mapping technology. In this paper, we present initial results of a collaboration between Smiths Detection and Duke University funded by the Transportation Security Administration. Using convolutional object detection algorithms trained on annotated x-ray images, we show real-time detection of prohibited items in carry-on luggage. Results of the work so far indicate that this approach can detect selected prohibited items with high accuracy and minimal impact on operational false alarm rates.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.