What Are Data Silos & How Can Data Science Solve Them

Brandon Cosley

Published in

Thinking Fast

3 min readJul 25, 2022

A Case for Data Science in Small & Medium Sized Businesses

In our modern digital world, data are everywhere.

But just because data are everywhere, does not mean that data are accessible everywhere. Quite the contrary in fact. Most of the data being generated by digital products are not useful because the data are not intended to be learned from but rather to be used in support of the digital product’s performance.

As a result, the data are stored within the confines of the architecture used to support the digital application. Consequently, the data are encoded in ways that are useful to the application and stored within the “walls” of that application. In other words, the data are siloed.

Let’s look at a very simple example from a small business perspective. For most small businesses, the most important data they collect are financial records. And for many of those same businesses, the application of choice for managing those financial records is QuickBooks.

Now let’s say the business had two products, product A and product B. Product A is a small product that is lower cost and satisfies a more common need to its consumers. Product B on the other hand is a bigger product that is more expensive (and more profitable) but represents a more niche market than product A.

Now let’s say the business wants to know what percent of customers who bought product A also bought product B.

Using the data in QuickBooks would be the best place to start to answer this question however the data are saved on QuickBooks’ servers and so not accessible to an analytics engine. Moreover, if the business also wanted to know what the characteristics of customers were who bought product A and B they would likely need a separate source of data, something like data from a customer relationship management (CRM) application.

This simple scenario brings up multiple problems associated with data silos. First, because the data are siloed within each application, there is no obvious way to being them together. Second, because the data are coded to serve the application they are not in a format that can answer questions like these.

Enter, data science.

This is where data science comes in and can help to solve for each of these problems through the use of feature engineering. At a minimum the data can be downloaded from each application. Once downloaded the data scientist can engage in recoding and reformatting the data so that they can be joined. Once joined and re-engineered, the data can then be used to answer simple questions like what percent of customers bought products A and B. This exercise then also opens up new possibilities such as understanding what characteristics were different regarding customers who bought both products when compared to those customers who only bought product A.

Ultimately, a data scientist may be able to leverage the data to build a predictive model that can predict the likelihood that someone will buy product B when we only have data from them on product A.

Like engaging to learn about data science, career growth, life, or poor business decisions? Sign up for my newsletter here and get a link to my free ebook.

What Are Data Silos & How Can Data Science Solve Them

Written by Brandon Cosley