Member-only story

Comprehensive Guide to the Data Warehouse

Data science can’t start until the data cleaning process is complete. Learn about the role of the data warehouse as a repository of analysis-ready datasets.

Nicole Janeway Bills
Towards Data Science
7 min readAug 2, 2020

--

Hunting for clean data in the enterprise setting. Photo by Hu Chen on Unsplash.

As a data scientist, it’s valuable to have some idea of fundamental data warehouse concepts. Most of the work we do involves adding enterprise value on top of datasets that need to be clean and readily comprehensible. For a dataset to reach that stage of its lifecycle, it has already passed through many components of data architecture and, hopefully, many data quality filters. This is how we avoid the unfortunate situation wherein the data scientist ends up spending 80% of their time on data wrangling.

Let’s take a moment to deepen our appreciation of the data architecture process by learning about various considerations relevant to setting up a data warehouse.

The data warehouse is a specific infrastructure element that provides down-the-line users, including data analysts and data scientists, access to data that has been shaped to conform to business rules and is stored in an easy-to-query format.

The data warehouse typically connects information from multiple “source-of-truth” transactional databases, which may exist within individual…

--

--

Towards Data Science
Towards Data Science

Published in Towards Data Science

Your home for data science and AI. The world’s leading publication for data science, data analytics, data engineering, machine learning, and artificial intelligence professionals.

Nicole Janeway Bills
Nicole Janeway Bills

Written by Nicole Janeway Bills

Founder of datastrategypros.com where we help busy professionals ace the Certified Data Management Professional (CDMP) exam and other data-related exams.

Responses (1)