With the vast amounts of unstructured data that is accrued through surveillance, body-cameras, mobile phones, and other sources, there is a need to perform data synthesis into natural language through automated methods. Recent advances in machine learning have enabled compression of data sequences into short, compact, informal summaries as keyframes and video thumbnails. Additionally, the capability to generate text that describes an overall image or full motion video has rapidly increased. However, generating text in more formal structures, such as reports, remains a relatively unsolved area of Natural Language Generation (NLG). This work is an initial attempt to understand the gap in the data summarization and document generation problem, specifically for the generation of situation reports and study the data annotations necessary to implement an end-to-end pipeline that would ingest data, summarize it, and generate a situation report for easy consumption by a user.
Generating imagery using gaming engines has become a popular method to both augment or completely replace the need for real data. This is due largely to the fact that gaming engines, such as Unity3D and Unreal, have the ability to produce novel scenes and ground-truth labels quickly and with low-cost. However, there is a disparity between rendering imagery in the digital domain and testing in the real domain on a deep learning task. This disparity/gap is commonly known as domain mismatch or domain shift, and without a solution, renders synthetic imagery impractical and ineffective for deep learning tasks. Recently, Generative Adversarial Networks (GANs) have shown success at generating novel imagery and overcoming this gap between two different distributions by performing cross-domain transfer. In this research, we explore the use of state-of-the-art GANs to perform a domain transfer between a rendered synthetic domain to a real domain. We evaluate the data generated using an image-to-image translation GAN on a classification task as well as by qualitative analysis.
Capsule networks have shown promise in their ability to perform classification tasks with viewpoint invariance; outperforming the accuracy of other models in some cases. This capability applies to maritime classification tasks where there is a lack of labeled data and an inability to collect all viewpoints of objects that are needed to train machine learning algorithms. Capsule Networks lend themselves well to applying their unique network architecture to the maritime vessel BCCT dataset, which exhibits characteristics aligned with the theorized strengths of Capsule Networks. Comparing these with respect to traditional CNN architectures and data augmentation techniques provides a potential roadmap for incorporation into future classification tasks involving imagery in data starved domains relying heavily on viewpoint invariance. We present our results on the classification of ship using Capsule Networks and explore their usefulness at this task given their current state of development.
Rendering synthetic imagery from gaming engine environments allows us to create data featuring any number of object orientations, conditions, and lighting variations. This capability is particularly useful in classification tasks, where there is an overwhelming lack of labeled data needed to train state-of-the-art machine learning algorithms. However, the use of synthetic data is not without limit: in the case of imagery, training a deep learning model on purely synthetic data typically yields poor results when applied to real world imagery. Previous work shows that "domain adaptation," mixing real-world and synthetic data, improves performance on a target dataset. In this paper, we train a deep neural network with synthetic imagery, including ordnance and overhead ship imagery and investigate a variety of methods to adapt our model to a dataset of real images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.