The amount of network traffic and host log data collected by the commercial and government organizations to maintain their security is growing every day due to proliferation of cyber sensors and security appliances. However, these huge amounts of data can no longer be stored efficiently or processed in real time. Security analysts need to make decisions about what data is most effective for current and novel attack detection, what data may be relevant to analyze in forensic mode, and what data can be aggregated and discarded without significant degradation to the enterprise security. Making these decisions manually, and reasoning about both utility of attack detection and the cost of data management, is infeasible. In this paper, we will present a model for developing a plan for data collection and archiving that adapts the sensor and storage state configuration to the analytical systems available in the organization, threats detected by those systems over time, and capacity and cost of collection and storage resources. Our planning model computes the sensor state and data archiving actions via approximate variational inference, decomposing the planning problem into perception, learning, and control, which enables tractable plan construction and incremental updates.
|