Poster + Paper
7 June 2024 Replicant framework for synthetic data generation
Emily Kenul, Margaret Black, Drew Massey, Zachary Havelka, Mawia Henkai, Kyle Gavin, Luke Shellhorn
Author Affiliations +
Conference Poster
Abstract
Acquiring representative data samples is pivotal to the process of creating machine learning models. However, gathering real-world imagery often presents challenges related to privacy concerns, regulatory constraints, financial resources, and accessibility limitations. Synthetic imagery offers an opportunity to augment real -world computer vision datasets while bypassing these obstacles. Yet, a fundamental challenge in working with synthetic imagery is ensuring that the generated data closely resembles its real-world counterpart. Further, it can be difficult to generate synthetic imagery with the same features and quality required to train well-generalized computer vision models. This research paper introduces and evaluates our custom-built Replicant framework – a novel synthetic data generation framework integrated into Booz Allen’s Vision AI Stack. In developing this service, we created a framework to produce synthetic imagery that closely resembles a real-world maritime dataset, and which can be used to develop any domain-specific synthetic data. We utilize this data to train object detection models and demonstrate how synthetic data benefits model performance. Additionally, we employ similarity metrics, including perceptual hashing (pHash), Optimal Transport Dataset Distance (OTDD), and Fréchet Inception Distance (FID) to assess the likeness of these real and synthetic datasets. Finally, we explore the applicability and effectiveness of explainable AI (XAI) techniques, such as Eigen Class Activation Mapping (Eigen CAM) and Shapley Additive Explanation (SHAP), to gain insights into the performance of our deep learning models and the utility of our synthetic data. Our findings underscore the vast potential of synthetic data to benefit deep learning model performance while overcoming challenges associated with real -world data acquisition.
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Emily Kenul, Margaret Black, Drew Massey, Zachary Havelka, Mawia Henkai, Kyle Gavin, and Luke Shellhorn "Replicant framework for synthetic data generation", Proc. SPIE 13035, Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II, 130351E (7 June 2024); https://doi.org/10.1117/12.3013826
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Video

Computer vision technology

Object detection

Artificial intelligence

Deep learning

Image analysis

Machine learning

Back to Top