ON the evolution of Data Engineering

Julien Kervizic
Hacking Analytics
Published in
5 min readOct 8, 2018

A few years ago being a data engineer meant managing data in and out of a database, creating pipelines in SQL or Procedural SQL and doing some form of ETL to load data in a data-warehouse, creating data-structures to unify, standardize and (de)normalize datasets for analytical purpose in a non-realtime manner. Some companies were adding to that a more front facing business components that involved building analytic cubes and dashboard for business users.

In 2018 and beyond the role and scope of data engineers has changed quite drastically. The emergence of data products has created a gap to fill which required a mix of skills not traditionally embedded within typical development teams, the more Software Development Oriented data engineers and the more data oriented Backend Engineers were in a prime role to fill this gap.

This evolution was facilitated by a growing number of technologies that helped to bridge the gap both for those of Data Engineering and those of a more Backend Engineering background.

Big Data: The emergence of Big Data and the associated technologies that can with it drastically changed the data landscape with Hadoop open-sourced in 2006, it became easier and cheaper to store large amount of data, Hadoop unlike traditional RDBMS databases did not require a lot of structuring in order to be able to process the data. The complexity to develop on Hadoop was initially quite high, requiring the development of Map Reduce jobs in Java. The challenges of processing big data forced the emergence of Backend Engineers working on analytical data workflow. It was not until Hive was open sourced in 2010 that the more traditional data engineers could get an easy bridge to get on boarded in this era of Big data.

“people watching orchestra” by Manuel Nägeli on Unsplash

Data Orchestration Engines: With the development of Big data, large internet companies were faced with a challenge to operate complex data flow without any tools such as SSIS used for more traditional RDBMS working in this ecosystem. Spotify opened sourced Luigi in 2012 and Airbnb Airflow (inspired by a similar system at facebook) in 2015. Coming from a heavy engineering driven background, these orchestration…

Julien Kervizic
Hacking Analytics

Living at the interstice of business, data and technology | Head of Data at iptiQ by SwissRe | previously at Facebook, Amazon | julienkervizic@gmail.com

Recommended from Medium


See more recommendations