Why Invest In A Data Warehouse

We recently had a medium sized company ask us why they might want to build a data warehouse. They had an operational relational database and a few ideas for products that we found brilliant. We saw clear ways that customers would benefit. However, we did have to take a moment to explain the value of a data warehouse.

Considering their data was small (under 500 GB). A data warehouse seemed like an appropriate fit for their uses cases they explained. As stated above, they had several products they wanted to develop to help better target customers and gain a competitive advantage.

Although mosts of our posts and projects are data science focused. Our projects often require us to be more than just analysts and data scientists. We also have to be data engineers and designers of data systems.

So here are some great reasons to invest in a data warehouse or other form of centralized data system that analysts and products can utilize.

Unburdening the IT department

One of the issues analyst run into is getting access to data. Typically this requires putting in a ticket to the IT department…waiting 4 weeks and then maybe…getting an excel data pull that does not match the request of the analyst at all!

Sometimes, a simple data request can take months to get correct. A data warehouse allows analyst the ability to pull data for themselves. This still needs to be managed, and must not be the production database. However, if implemented properly, then analysts will be able to get the data for themselves.

Enhances Business Intelligence

The goal of a data warehouse is to help answer business questions. This provides managers and directors the ability to make clearer and more accurate decisions. For instance, our team once built a data warehouse for an insurance plan that helped them figure out that their targeted behavioral health plan was actually reducing their per patient per month costs. They were unable to figure out whether their specialized plan was saving money from their operational database. We built them a reporting database to help manage their complex business logic as well as develop a dashboard that helped show them that their targeted mediation was working. They used this to show their directors the value of their project which allowed them to continue to have funding for their project.

Improves Data Quality and Consistency

Data warehouses and ETLs(Extract, Transform, Load) reduce having multiple sources of truth, copy-paste errors and the risk of human error. This is because when creating the ETLs to port data from the data from the operational database the data warehouse, it is easy to develop QA suites that check and make sure the data stays accurate.

Thus, a data warehouse is better suited for keeping data clean in order to generate reports and create applications.

Recording changes to build history

One of the largest advantages that data people enjoy about data warehouses is the ability to track changes. This might be a status change of a patient, a canceled order from a customer, a sudden discount, etc, etc. All of these small business logic changes might now show up clearer in the operations operational database. However, when the data is ported over into the data warehouse via ETL it contains updates and changes to past events. These have to be logged. This is why data warehouses are great. They track data changes using an ideology called slowly changing dimensions.

Makes Data Easy To Access

Querying data from an operational system is not practical and can be problematic, especially when dealing with systems like Salesforce and Workday. A company cannot practically develop a web application to sell or allow multiple users access to the exact same data set ,without a centralized data warehouse. Indeed, querying data from an operational system leads to multiple Excel files with the suffix V1, V1.1, V1.2, V1.3…

In addition, besides being easily accessible from the standpoint of getting to the data. The actual shape of a data warehouses makes it much easier to traverse. A typical relational database looks like the diagram below.

This model requires more joins from table to table to get each data point required. This is signified by each of the arrows.

Compared to what a data warehouse might look like:

A data warehouse typically takes on a much more single process focused design. Where each value does not require several joins across different tables to get to the data required.

This makes it much easier for non-technical people to write their own queries. Thus, furthering who has access to the database. This does not mean everyone should have access!

Increases Query Performance

Data warehouse structure is developed to increase the performance of analytical tasks and metric development, unlike an operational database. Additionally, data warehouses partitions analytical and operational tasks, limiting the risk of locking out users as data updates occur.

Faster and more flexible reporting

Creating a robust data warehouse allows a team to think about more than just the current product or reports. If designed well, a data warehouse offers the opportunity to mature to support further products and reports. However, this requires creating a data system that is not too rigid and more importantly, not held together by pieces of scotch tape.

Summary

Data warehouses provide a lot of advantages. They allow companies to analyze their data effectively as well as trust it! It creates accuracy, dependably and democratizing data so that more than just the purely technical employees can access it.

This avoids bottlenecks and allows your management to get their information much faster!

If you would like to learn more about data science or data in general, please feel free to read some of the posts below!

How Do Machine Learning Algorithms Learn Bias?

Personalization With Contextual Bandits

How To Survive Corporate Politics As A Data Scientist

What Is A Decision Tree