Colors and Consequences: Simplifying Data Classification for Strengthening Security

Photo by Patrick Fore on Unsplash

The need for data classification has been around for decades. Broadly put, it’s the process used to help organizations know what data they have, where it lives, how important it is, and who can access it. It’s a fundamental part of information security; without proper classification, there can’t be proper protection.

Data classification came about as a practical way to prioritize and categorize the data that needs the most extensive security controls. The heavier and more restrictive controls you apply to data, the harder it is to get value out of that data. Every organization has limited resources, people, and hours in the day. There are tangible restrictions on pursuing “perfect security.” Besides, there is no such thing. Security is an ongoing process; there is no end goal. There is only strive to achieve greater proactiveness in security and defense.

Security and accessibility are in a constant tug-of-war. Security is tasked to apply the appropriate levels of control without making a negative impact on business operations. For example, say I was asked to secure a retailer’s customer identity database. This data contains names, addresses, social security numbers, credit card numbers, and spending history of their customers. And I was told to secure it as close to 100 percent secure as possible. I would download the dataset, burn it to a DVD, steal a silver Toyota Camry, drive to a remote desert, and bury it in an unmarked location. Secure? Yes. Accessible? Not so much.

Not all data is the same

There isn’t a set industry standard for classifying data and not all data needs maximum protection. Every organization should create classification categories that make sense for their needs. There are however plenty of general frameworks that organizations can use and personalize to implement data classification that protects what is most important with minimum impact to the business. The simplest and most common method is a green, yellow, and red color model. Another common hierarchy follows the lines of protected, sensitive, confidential, and public data. No matter the method of categorization, organizations have to make their own decisions when it comes to what data goes where.

Let’s use the green, yellow and red color model as an example. Security controls on data are scaled depending upon the value and criticality of that data. This is denoted by the classification color they are given. One way to determine a data’s classification color is what I call the “Prediction of Pain” scale; if the type of data — green, yellow, or red — would be leaked and exposed on the front page of The New York Times, how much pain will the company feel? It’s an easy way to think about what data is actually critical to protect. A color classification guide could be as simple as the below:

  • Green data: A quick reputation hit — likely related to publically available data or confidential company records — that would likely last no more than a 24-hour news cycle then be quickly forgotten; no real hit to the company’s valuation in terms of stock price.
  • Yellow data: Issuing a breach notification because sensitive customer data has been exposed. A notification to the SEC, possibly a fine and the loss of some business. Minor to medium losses.
  • Red data: Major news cycle, extreme fines, loss of customer confidence and trust, potential loss of more than 50 percent of revenue, all the way to a company Extinction Level Event.

By starting with the consequences first, you can work backward to employ security controls that are directly aligned to minimize the probability of those consequences happening. Starting with consequences also makes it easier to ask the right questions to classify the data and discover security gaps.

For example:

  • What systems are processing red data?
  • Are employees able to access systems that store red data when they shouldn’t have the permission to do so?
  • Are there automation processes that inappropriately move red data over to systems designed to only handle yellow or green data?

Organizations can have a wide spectrum of colors they use to drive more granular controls; it’s dependent on the organization’s needs and the different levels of data they deal in. There is a fine line however between having too few or too many classifications. Too few and you get broad classifications that can over secure or under-protect. Too many and users can become confused and overwhelmed by choice and make mistakes when classifying data.

Getting a simple process in place that everyone follows is better than having a detailed yet more complicated one that few understand or are likely to make more mistakes when applying. The type of classification system you use is not as important as having one that is properly defined. There are many ways to go about this; pick the one that will actually be put to good use.