Defense in Depth Implemented on Hadoop
This is a multi-part blog entry based on my Hadoop Summit presentation that I delivered with Chester Parrot on 6/9/2015.
Defense in Depth is a concept in which multiple layers of security controls (defense) are placed throughout a system or a network. Its intent is to provide redundancy in the event a security control fails or a vulnerability is exploited. The key behind Defense in Depth is to have multiple dedicated controls (I will refer to them as point tools) in specific places looking at very specific events in very specific ways. The hope is that the specificity of these controls as well as the breadth (variety) of them will make it impossible for the bad guys to avoid every single one of them and remain undetected.
This is a great idea, but the problem with using conventional approaches to Defense in Depth in today’s world is that the point tools can’t handle the volume of data thrown at them by modern enterprises (they primarily scale vertically, but not horizontally), and that they don’t benefit from the collective intelligence of security events collected by other point tools. In other words, they turn into spam bots. A large bank, for example, may have 100s of these point tools, each of which emits 100s of alerts per hour. So how does one make sense of it all?
A common solution is to introduce a Security information and event management (SIEM) such as Splunk or ArcSight. A SEIM would stream all of your security events coming off of the point tools into the same data base and try to organize them, combine them, and make them easily searchable. But this approach still requires a human in the loop to make a determination as to which alerts are worth looking at and it still does not address the fact that the point tools (which in this scenario become nothing more than sensors emitting signal) can’t scale and are unable to benefit from the collective intelligence of other sensors. So the SEIM approach really addresses a symptom of the problem (being overwhelmed with alerts), but does not address it fundamentally (how do I stop uninformative alerts from happening in the first place and how do I get fewer, but better alerts?).
Both questions of scalability and contextual awareness are more appropriately addressed with massively scalable and massively parallel systems like Hadoop. Over the years, as we developed our analytics capability on top of Hadoop class of technologies, and specifically Apache Storm (all things real-time) and Hadoop Map Reduce (all things batch) our vision for Defense in Depth evolved.
Initially we wanted to create a one for one feature parity with legacy point tools most people were using in their data centers (in other words port every significant legacy point tool or its equivalent capability into Hadoop). Then we wanted to develop analytics to derive collective intelligence and context out of the data we have collected in a real-time fashion (in other words figure out what the heck is going on before we fire 100s of a alerts at people). At some point we realized that we couldn’t just port these tools over. We needed to fundamentally re-write and re-do many of the capabilities. The common shortcomings of the legacy point tools were : they couldn’t easily handle unstructured data, they utilized a lot of shared state (a big no-no in horizontally scalable systems), some analytics were slow and required multiple passes over the data, some were limited only to in-memory, and others produced alerts that were too generic to be useful or tied with other intelligence.
So we decided to borrow where appropriate, approximate where appropriate, and create our own analytics where appropriate. In the end we developed a services catalog that looked like this:
Misuse Detection capability consists of simple signature and rules-based matching and is something we strongly wanted to eliminate out of our system. We wanted to create something that uses advanced analytics and not static if/else rules. Unfortunately, in the present landscape, you can’t offer a service where the underlying platform doesn’t offer this capability. We can say the same about the SEIM. No one really gets a lot of value out of them, but people hold on to them for “just in case” moments. So we started with these capabilities. We teamed up with HortonWorks to create OpenSOC-Streaming. OpenSOC-Streaming is an open source framework that can take in large volumes of telemetry, enrich it in-line, alert on it via a static rules engine, and stream both the telemetry as well as alerts into Elastic Search, Hbase, or HDFS for long-term storage. This was a challenging project and took us a few iterations to get working, but eventually we got past it and moved on to more interesting analytics.
The second analytics type that we tackled was Intrusion Detection. Intrusion Detection is an umbrella term for all things that deviate from normal expected behavior (aka anomaly detection). If you don’t know what specifically you are looking for, Intrusion Detection alerts would be a good place to start. There are different types of anomaly detection that we were interested in doing. First, and the most fundamental type, was anomaly detection on networking data and machine exhaust data. This is what our presentation at Hadoop Summit was about. But in addition to looking for anomalies in the networking data we can also mine verbose logs, as well as look for behavioral anomalies such as behavior of specific assets, users, and the interactions between users and assets. We will give a more detailed explanation of these methodologies in one of our next conference talks. Stay tuned to our twitter accounts for more information.
When we finished Intrusion Detection it became clear to us that the “umbrella” analytics needed to be complemented by more targeted and more narrow in scope specialized analytics approaches to detect more advanced types of APTs or malware. So we created supervised classifiers. These classifiers use supervised machine learning techniques to look for specific and sophisticated malware behaviors, characteristics of APTs, malicious scripts, malware attempts to hide itself, open C&C channels, and attempts to exfiltrate information. The training sets were built up from Cisco’s internal malware and log data bases, external data sets, and data captured from our customer pilots. The Classifier alerts are then correlated with Intrusion Detection alerts to first identify a sophisticated APT or malware, and then associate it to all relevant anomalies to get a broader picture of it’s impact on the customer’s environment.
The final type of analytic that we wanted to bring to market was Look-Ahead analytics. This is a type of analytic that looks outside of the customer network to try to look for external threats and then attempt to link them to specific customer assets. An example of this analytic is social media analytics, message board scraping, or setting up honey pots to see who is interested in customer assets. A particularly helpful type of honey pot that I have used in the past was to set up a variety of fake job adds for various customers and look for trends as to which external APT looks at them. Based on these trends we can make assumptions which customer assets are more likely to be attacked than others. This is just one example. We are in early stages of experimenting with these techniques to try to distill them to actionable intelligence we can use in protecting our customers.