Bringing Clarity to Really Really Big Data: A Case for AI and Machine Learning to Help Crunch and Protect Our Data

By Paul Ferrillo & George Platsis

Originally published on Tripwire, March 20, 2017

It’s funny how kids have an affinity for toys we enjoyed as kids — like Legos. They will spend hours creating the biggest “thing,” often leading to a parent’s near universal response, “Johnny! That is the biggest tower I have ever seen! Great job!” Children (and we) love Legos because they foster imagination, offering a limitless way to create something “gigantic!” And in a more practical sense, Legos sometimes give us a great perspective on the important concept of “scale.”

As counsellors and consultants, replicating the “scale” issue as it relates to the respective data, information and network security problems is a challenge. Unfortunately, “layperson” directors and officers of public companies, along with executives in government, tend to view “scale” (as it pertains to data protection) as a bad thing (and even a scary thing). Part of the challenge here is that there are few practical ways to explain to those holding these positions that an organization’s security operations center may receive upwards of one million “incidents “every day and, at the same time, adequately deal with, and investigate, the potential peril inherent in such incidents, and reasonably assure that not even one of these small incidents slips between the cracks.

“Big data” analytics as a business tool is fantastic because we can translate those figures into, say, dollars. But “big data” is also a cybersecurity requirement (i.e. using network traffic, data, sensors and other feeds to help us determine what is “normal” in our network and what is not) and cybersecurity data is not as simple to translate into something we can easily conceptualize, like say, dollars! Therefore, until we understand the “scale” of what we are dealing with, it will be very hard to address the security issues associated with cyberspace.

So how much “big data” do we produce? And how do we respond to it? These are important basic questions that need to be better understood so that the much tougher question — how do we protect our data? — can be addressed.

How Much Data Do We Produce?

Let’s start with this basic concept: today, “data” is everything. Both personally and professionally, much of our lives have been converted into a bunch of zeroes and ones. Our reliance on data has never been greater and is only certain to grow, especially with the explosion of the Internet of Things (IoT). And the amount of data — good, bad, junk — we produce continues to grow (at breakneck speeds), taking up space on global networks (meaning that if you were able to control even a fraction of this data flow, you would be able to unleash a wicked DDoS attack).

So how much data exactly is traveling — nearly at the speed of light — through the networks? According to a June 2016 Cisco white paper, we are in the “zettabyte era” in terms of global IP traffic. Great! What is a zettabyte?

Back to Basics

To unpack that question, we need to start with a few basics, the first being that humans have cognitive limitations. Our limitations become evident when trying to understand very large (or very small) numbers. We can use notations to represent large numbers, such as 1 ZB equalling 1 x 10^21 bytes. But does that notation mean anything to you?

Denote one million as 1 x 10^6, and it may mean something to you, but that is because we have a better understanding of what “one million” means in practical terms. Let us conceptualize “one million” using dollars to create a reference point: your salary is $50,000 a year, you work for 20 years, and assuming you spend nothing, you would accumulate one million dollars. Now, using the table below, we will “scale up” your salary:

read the rest on Tripwire.