Unlocking Hidden Potential with Dark Data: From Integration to Actionable Insights (Part 1 of 2)

Tia Duncan
7 min readSep 5, 2024

--

Photo by Markus Spiske on Unsplash

Introduction to Dark Data

What is Dark Data?

Dark data is the large amount of data that organizations collect, process, and store, but don't use in any meaningful way. It's usually something like unused customer data, like from customer support interactions, old emails, purchase histories, server log files, and IoT data like data from sensors or connected devices.

Some say out of all company data, dark data is around 50% of it. That’s a lot. And it is not that cheap to store it either.

This data has potential value, but it remains unused, which poses risks. Then there are the missed opportunities also, especially as it can increase storage costs and create security vulnerabilities.

So from here, there are two ways we can approach this; either we think about identifying and deleting it so that we don’t have to store it, or we can think about how to use it.

I’m going to focus on the second one because the first one would make for a fairly underwhelming article.

And also because if you want to maximize business insights and maintain data hygiene, looking into utilizing a whole bunch of data you’re not utilizing right now is actually a good idea, and if you don’t find anything useful then maybe think about deleting it anyway.

The Potential of Dark Data

Utilizing dark data can get you competitive advantage in many different ways. By looking at data that usually gets overlooked, you can find unique insights to help you outpace competitors. But insights are not the only thing we're interested in, finding inefficiencies or new opportunities can also cut down on operational and other costs. Then this hidden data might even reveal trends or needs that lead to creating new products or services.

Since dark data has significant potential, tapping into it efficiently is a big challenge, especially when it comes to integrating and analyzing this data at scale.

I like to use examples in my guides so in this one we’ll look at how an online bakery offering customizable cakes can use its dark data to boost business outcomes.

Now that you know the potential of dark data, the next step is to figure out where it lives in your organization, with our online bakery as a hands-on example.

Identifying Dark Data Within Your Organization

Now that we’ve explored the potential of dark data, it’s time to look at the next step, actionable insights. This section will guide you on how to find these hidden data assets within your enterprise, using a made up online bakery as an example.

So how do we find where the dark data is?

There is no easy way to do this, you have to really look into what you have and what would be useful in your specific industry, but here's a few ideas.

A good way to start looking is doing an audit, either you already have something regular in place, but if not, that's okay too, you can just start one and review all of your existing data sources to spot any data that’s been sitting unused or underutilized.

Then you can assess your inventory, taking the time it takes (a long time probably) to catalog all of your data assets.

This will give you an understanding and an organized and meaningful inventory of what you have that you can look at and think.

Also be sure to schedule some meetings with people who might have some good ideas. Maybe you can also use some AI based tools too that can scan and identify potential sources.

For our online bakery example, these above methods could reveal customer preferences hidden in order histories, chat logs, and overlooked website interaction data.

Classifying and Prioritizing Dark Data

So once you find your dark data, it's a good idea to categorize it. One way to do it is by relevance: consider how well the data aligns with your current business goals and prioritize them. Then there's risk to think about like potential security or compliance risks that might come with the data. This helps prioritize it more but it's really up to you and what you find to decide what is more important, relevance, risks, or even potential value that utilizing this data can add, like actionable insights or new revenue streams.

For our example online bakery, prioritizing the analysis of data of frequently customized cake options and common customer requests could be a good idea. With this knowledge, we could adjust our baking process to always have let's say frozen red velvet sheets in storage in an optimal number. Not too few, so we never find ourselves in the situation that there's no frozen cake sheets and we have to make 5 of them quickly, but also not too many so that we would use the freezer space inefficiently.

Once we have identified and prioritized our dark data, the next step is to integrate it into our existing data architecture for analysis. This is similar to how our bakers would begin to bake extra red velvet sheets for the freezer.

Integrating Dark Data into Existing Data Architectures

Once all the interesting dark data is identified, the next step is making sure it can be seamlessly integrated into your existing data systems. Just like the bakery would integrate customer data, your organization needs to make sure that all systems can work together well.

Assessing Compatibility and Evaluating Current Data Architecture

So like mentioned above, before we do anything else, we will need to look at compatibility. One thing to look at is format. We have to make sure that our data can be processed by our existing systems, which might mean going through data cleaning or transformation steps. Another thing that can cause problems is older, legacy systems that might not easily support new data integration, and could require either system upgrades or the use of middleware solutions.

Then there's infrastructure, we need to assess if it can handle the extra load, or if additional storage and processing power will be needed. Which leads us to the next step, which is preparing the infrastructure for integration.

Preparing Infrastructure for Integration

For this, you usually have something going on already or there are existing standards or a specific software that you need to use. If that's the case, great, but if not, there are a few ways you can do this.

One way is by using data lakes, which are centralized repositories where raw data can be stored in its native format until it’s needed for analysis. Then there are data warehouses, they are optimized for querying and reporting on data. I would look into scalable cloud solutions also but it depends on the company and the use case.

For the bakery example, this could mean integrating customer preference data into its CRM and order management systems to make it easier to understand customer behavior trends.

Designing a Scalable Integration Process

Once we know what we're working with and got it all ready, we can start designing the process.

It's a good idea to start with data cleaning, removing duplicates, correcting any errors, and standardizing formats. Then convert the data into formats that are easily analyzed (data transformation), like structured databases or tabular formats.

This again depends on what you're working with but what's universally true is that it's very important to design your processes with future data growth in mind to make sure that they can handle increasing amounts of data without a hitch.

For the hypothetical bakery, this means creating a scalable system to consistently collect and integrate customer data as the business grows.

Preparing for Data Analysis

Once the dark data is integrated, the next step should be processing and analyzing it to extract valuable insights.

Processing and Analyzing Dark Data for Insights

Next Steps in Data Utilization

With dark data successfully integrated, the next objective is to process and analyze it to extract actionable insights. This part is about transforming raw information and data into valuable business intelligence.

Choosing the Right Analytical Tools and Techniques

You're probably going to be limited by what your organization has, allows or has budget for, but there are a few types of software you can use, usually. Machine learning models can spot patterns or predict outcomes by learning from historical data. You can also use different data mining techniques, that can help you discover hidden patterns and trends in large datasets. Then there are big data analytics platforms like Hadoop or Spark.

Many options, some free, some open source, some very expensive, but you're most likely going to be stuck with what you're allowed to use at your company anyway. The point is to make the most of what you have whether it's Weka or Databricks, and to use the right tool for your data types.

Matching Tools to Dark Data Types

A few examples of finding the right method for your use case:
If you have unstructured text like emails, support tickets, or social media posts, it’s best to look into text mining.
If you have historical data and want to predict future trends, like customer buying behavior, try predictive analytics.
If you are interested in customers opinions, you can try analyzing reviews, feedback,or social media discussions, also called sentiment analysis
As a bakery example, data mining tools could help understand customer preference trends, and predictive analytics could be used to anticipate demand for specific cake designs or flavors.

Transforming Dark Data into Actionable Insights

Methods for Preparing Data for Analysis

When you have your tool and method you would like to use, we can start preparing the data. This generally means cleaning, structuring, and enriching it. First we remove irrelevant or incorrect data, then organize it into structured formats like databases or sets, and then add more context like connecting different data sets based on business. For example, combining sales figures with customer feedback data.

As we move forward, Part 2 will explore how to ensure governance, compliance and security, leverage dark data for decision making, and how to stay ahead with the latest trends and future strategies in dark data so stay tuned!

--

--

Tia Duncan

[ 🌟100% follow back🌟 ] [100% follow back] Experienced tech strategist decoding digital transformation, data strategy and emerging tech trends. Here to connect