Chaos Engineering (CE), which Netflix introduced in 2008, is used by researchers to assess and find weaknesses in system resiliency. Such weaknesses can arise, when subsystems are individually robust, but that robustness disappears when multiple subsystems are paired together in a System of Systems (SoS). CE researchers develops methods and metrics for finding such fragilities. In this paper, we expand previous examinations of CE experimentation for SoS and introduce Security Chaos Engineering (SCE) for SoS. These SCE experiments include terminating message service, flooding multi queues/message, and injecting corrupted Service. SCE assumes compromise by adding a malicious actor to the tests that can induce adversarial failures into a SoS. For our SoS testbed, we instantiated a virtual Unmanned Aerial Vehicle (VUAV). We use the open-source Chaos Toolkit to run consistent CE and SCE experiments on the VUAV. Chaos Toolkit with SCE exposes the VUAV attack surfaces to evaluate performance and system security. This research allows us to establish an understanding of baseline system performance and gaps in procedures, techniques, and tools from the state of the art as applied to DoD-relevant systems like SoS. We use the load placed on the Central Processing Unit (CPU) and Random-Access Memory (RAM) by the VUAV as metrics for baseline performance. The results showed that these two metrics did not provide enough fidelity in where CE/SCE creates failures. Feeding these results into the CE methodology allows for additional metrics to better pinpoint failures with CE/SCE testing.
|