Training state-of-the-art image classifiers and object detectors remains an extremely data-intensive process to this day. This is because inherently data-hungry, deep supervised networks are the traditional framework of choice. The significant data needs in turn impose strict requirements on the data acquisition, curation, and labelling stages that typically precede the learning process. This poses a particularly significant challenge for military and defense applications where the availability of high-quality labeled data is often limited. What is needed are methods that can effectively learn from sparse amounts of labeled, real-world data. In this paper, we propose a novel framework that incorporates a synthetic data generator into a supervised learning pipeline in order to enable end-to-end co-optimization of the discriminability and realism of the synthetic data, as well as the performance of the supervised engine. We demonstrate, via extensive empirical validation on image classification and object detection tasks, that the proposed framework is capable of learning from a small fraction of the real-world data required to train traditional, standalone supervised engines, while matching or even outperforming its off-the-shelf counterparts.
In an era of immense data generation, unlocking the full potential of Machine Learning (ML) hinges on overcoming the limitations posed by the scarcity of labeled data. In Computer Vision (CV) research, algorithm design must consider this shift and focus instead on the abundance of unlabeled imagery. In recent years, there has been a notable trend within the community toward Self-Supervised Learning (SSL) methods that can leverage this untapped data pool. ML practice promotes self-supervised pre-training for generalized feature extraction on a diverse unlabeled dataset followed by supervised transfer learning on a smaller set of labeled, application-specific images. This shift in learning methods elicits conversation about the importance of pre-training data composition for optimizing downstream performance. We evaluate models with varying measures of similarity between pre-training and transfer learning data compositions. Our findings indicate that front-end embeddings sufficiently generalize learned image features independent of data composition, leaving transfer learning to inject the majority of application-specific understanding into the model. Composition may be irrelevant in self-supervised pre-training, suggesting target data is a primary driver of application specificity. Thus, pre-training deep learning models with application-specific data, which is often difficult to acquire, is not necessary for reaching competitive downstream performance. The capability to pre-train on more accessible datasets invites more flexibility in practical deep learning.
Registration of image collections and video sequences is a critical component in algorithms designed to extract actionable intelligence from remotely sensed data. While methodologies for registration continue to evolve, the accuracy of alignment remains dependent on how well the approach tolerates changes in capture geometry, sensor characteristics, and scene content. Differences in imaging modality and field-of-view present additional challenges. Registration techniques have progressed from simple, global correlation-based algorithms, to higher-order model fitting using salient image features, to two-stage approaches leveraging high-fidelity sensor geometry, to new methods that exploit high-performance computing and convolutional neural networks (ConvNets). The latter offers important advantages by removing model assumptions and learning feature extraction directly through the minimization of a registration cost function. Deep learning approaches to image registration are still relatively unexplored for overhead imaging, and their ability to accommodate a large problem domain offers potential for several new developments.
This work presents a new network architecture that improves accuracy and generalization capabilities over our modality-agnostic deep learning approach to registration that recently advanced the state of the art. A thoroughly tested ConvNet pyramid remains the core of our network approach, and has been optimized for registration and generalized to begin addressing derivative applications such as mosaic generation. Further modifications, such as objective function masking and reduced interpolation, have also been implemented to improve the overall registration process. As before, the trained network ingests image frames, applies a vector field, and returns a version of the input image that has been warped to the reference. Qualitative and quantitative performance of the new architecture is evaluated using several overhead still and full-motion video (FMV) data sets.
Stabilization and registration are common techniques applied to overhead imagery and full-motion video (FMV) during production to facilitate further exploitation by the end user. Algorithms designed to accom- plish these tasks must accommodate changes in capture geometry, atmospheric effects, and sensor charac- teristics. Moreover, algorithms that rely on a controlled image base (CIB) reference typically require some degree of robustness with respect to differences in imaging modality. While many factors contributing to gross misalignment can be mitigated using available sensor telemetry and rigorous photogrammetric modeling, the subsequent image-based registration task often relies on loose model assumptions and poor generalizations.
This work presents a modality-agnostic deep learning approach to automatically stabilize and register overhead FMV data to a reference image such as a CIB. The field of deep learning has received significant attention in recent years with advances in high-performance computing and the availability of widely adopted open source tools for numerical computation using data flow graphs. We leverage recent developments in the use of fully differentiable spatial transformer networks to simultaneously remove coarse geometric differences and fine local misalignments in the registration process. Most importantly, no model is required. A convolutional neural network (ConvNet), complete with a spatial transformer, is trained using pairs of frames of FMV data as the input and corresponding label. Once the mechanism by which the deformable warp is learned, the trained network ingests new data and returns a version of the input image sequence that has been warped to a user-specified reference. The performance of our approach is evaluated using several real FMV data sets.
KEYWORDS: Cameras, Atomic force microscopy, Clouds, Error analysis, Image processing, Sensors, Global Positioning System, 3D modeling, 3D image processing, Imaging systems
Automatically extracted and accurate scene structure generated from airborne platforms is a goal of many applications in the photogrammetry, remote sensing, and computer vision fields. This structure has traditionally been extracted automatically through the structure-from-motion (SfM) workflows. Although this process is very powerful, the analysis of error in accuracy can prove difficult. Our work presents a method of analyzing the georegistration error from SfM derived point clouds that have been transformed to a fixed Earth-based coordinate system. The error analysis is performed using synthetic airborne imagery which provides absolute truth for the ray-surface intersection of every pixel in every image. Three methods of georegistration are assessed; (1) using global positioning system (GPS) camera centers, (2) using pose information directly from on-board navigational instrumentation, and (3) using a recently developed method that utilizes the forward projection function and SfM-derived camera pose estimates. It was found that the georegistration derived from GPS camera centers and the direct use of pose information from on-board navigational instruments is very sensitive to noise from both the SfM process and instrumentation. The georegistration transform computed using the forward projection function and the derived pose estimates prove to be far more robust to these errors.
Recent technological advances in computing capabilities and persistent surveillance systems have led to increased focus on new methods of exploiting geospatial data, bridging traditional photogrammetric techniques and state-of-the-art multiple view geometry methodology. The structure from motion (SfM) problem in Computer Vision addresses scene reconstruction from uncalibrated cameras, and several methods exist to remove the inherent projective ambiguity. However, the reconstruction remains in an arbitrary world coordinate frame without knowledge of its relationship to a xed earth-based coordinate system. This work presents a novel approach for obtaining geoaccurate image-based 3-dimensional reconstructions in the absence of ground control points by using a SfM framework and the full physical sensor model of the collection system. Absolute position and orientation information provided by the imaging platform can be used to reconstruct the scene in a xed world coordinate system. Rather than triangulating pixels from multiple image-to-ground functions, each with its own random error, the relative reconstruction is computed via image-based geometry, i.e., geometry derived from image feature correspondences. In other words, the geolocation accuracy is improved using the relative distances provided by the SfM reconstruction. Results from the Exelis Wide-Area Motion Imagery (WAMI) system are provided to discuss conclusions and areas for future work.
The Archimedes palimpsest is one of the most significant early texts in the history of science that has survived to the
present day. It includes the oldest known copies of text from seven treatises by Archimedes, along with pages from other
important historical writings. In the 13th century, the original texts were erased and overwritten by a Christian prayer
book, which was used in religious services probably into the 19th century. Since 2001, much of the text from treatises of
Archimedes has been transcribed from images taken in reflected visible light and visible fluorescence generated by exposure of the parchment to ultraviolet light. However, these techniques do not work well on all pages of the manuscript, including the badly stained colophon, four pages of the manuscript obscured by icons painted during the first half of the 20th century, and some pages of non-Archimedes texts. Much of the text on the colophon and overpainted pages has been recovered from X-ray fluorescence (XRF) imagery. In this work, the XRF images of one of the other pages were combined with the bands of optical images to create hyperspectral image cubes and processed using standard statistical classification techniques developed for environmental remote sensing to test if this improved the recovery of the original text.
The objective of the character recognition effort for the Archimedes Palimpsest is to provide a tool that allows scholars of ancient Greek mathematics to retrieve as much information as possible from the remaining degraded text. With this in mind, the current pattern recognition system does not output a single classification decision, as in typical target detection problems, but has been designed to provide intermediate results that allow the user to apply his or her own decisions (or evidence) to arrive at a conclusion. To achieve this result, a probabilistic network has been incorporated into our previous recognition system, which was based primarily on spatial correlation techniques. This paper reports on the revised tool and its recent success in the transciption process.
A handwritten codex often included an inscription that listed facts about its publication, such as the names of the scribe and patron, date of publication, the city where the book was copied, etc. These facts obviously provide essential information to a historian studying the provenance of the codex. Unfortunately, this page was sometimes erased after the sale of the book to a new owner, often by scraping off the original ink. The importance of recovering this information would be difficult to overstate. This paper reports on the methods of imaging, image enhancement, and character recognition that were applied to this page in a Hebrew prayer book copied in Florence in the 15th century.
A variation of the matched spatial filter (MSF) has been developed and tested. It is derived by a power-series expansion of the ideal MSF that, in the absence of background noise, produces a Dirac delta function peak at the detection location on the correlation plane. The motivation for the approximation is to design an MSF that produces a narrow correlation peak with reduced susceptibility to noise amplification in the filtering process. This new filter, which we will call the complement MSF, includes the intrinsic phase information of the reference signal and a magnitude term that can be truncated at any order. Experimental results show that the complement MSF produces correlation peaks that can be controlled by varying the order of approximation, and that these peaks may improve discrimination over classical matched filtering methods such as the familiar cross correlation. In addition, we have exploited the familiar Wiener-Helstrom method for inverse filtering to blend both classical MSFs and complement MSFs of different order for imaging scenarios where the noise power spectrum is known or can be estimated. Preliminary outputs using this technique have shown sharper correlation peaks and better noise floor suppression than yielded by implementing the blended components individually.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.