Sitemap
TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

How Data Scientists Can Reduce Data Wrangling Time with a Data Mart

What’s a data mart and why data scientists should use one

5 min readMay 21, 2022

--

Press enter or click to view image in full size
Photo by Dima Valkov from Pexels

As a data scientist, you can spend up to 80% of your time cleaning and transforming data in order to generate actionable insights and build machine learning models to create business impact. Now imagine a world where you can spend more time on analysis and model development instead of cleaning data. This can become a reality by having a data mart defined as a subset of data within a data warehouse developed for a specific group of users or business unit.

Introduction

When I started as a data scientist, there was just raw data in the data warehouse with no ETL pipelines in place to create a single centralized table I could use to query customer information. Every time I needed customer data, I had to join multiple tables together and apply the proper business logic. This was tedious to rerun for every analysis. Eventually, I put these frequent queries into ETL pipelines and created an analytics data mart that helped reduce my data cleaning and preparation time by more than 50%. Now that you know the benefits of having a data mart, let me review the process I used to build one and how you can apply it in your company.

1. Determine the business

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Vicky Yu
Vicky Yu

Written by Vicky Yu

Musings of a data scientist turned data analyst. Sharing my data experiences one story at a time.

Responses (1)