Member-only story
Data Engineering at Scale
How to speed up building your Big Data ETL pipelines and getting them into production
We have been hearing the slogan “Data is the new Gold” since a couple of years for now and many companies are heavily investing to follow down this route. Initially most companies believed that is was enough to hire a bunch of expensive data scientists to become a leader in the world of data driven companies. For many reasons it turned out that becoming a data centric organization is much more difficult than only hiring top notch data scientists. This article will focus on one important aspect required for taking off as a data company.
Surprisingly it wasn’t well understood until a couple of years ago that getting and preparing the data for analytical questions actually is much harder than expected. This first step requires a lot of technical knowledge and skills about file formats, database systems, APIs — all important topics, but mostly outside the focus of a data scientist, who’s main expertise is in applying statistical or machine learning methods to data. At this point, the new role “data engineer” has been invented. Actually similar role already existed long before in the world of Business Intelligence (BI) and Data Warehouses (DWH) but it was given a different name.