Member-only story
How Data Scientists Can Reduce Data Wrangling Time with a Data Mart
What’s a data mart and why data scientists should use one
As a data scientist, you can spend up to 80% of your time cleaning and transforming data in order to generate actionable insights and build machine learning models to create business impact. Now imagine a world where you can spend more time on analysis and model development instead of cleaning data. This can become a reality by having a data mart defined as a subset of data within a data warehouse developed for a specific group of users or business unit.
Introduction
When I started as a data scientist, there was just raw data in the data warehouse with no ETL pipelines in place to create a single centralized table I could use to query customer information. Every time I needed customer data, I had to join multiple tables together and apply the proper business logic. This was tedious to rerun for every analysis. Eventually, I put these frequent queries into ETL pipelines and created an analytics data mart that helped reduce my data cleaning and preparation time by more than 50%. Now that you know the benefits of having a data mart, let me review the process I used to build one and how you can apply it in your company.

