ON DATA ENGINEERING

Data Engineer Archetypes

An overview of the different profiles of Data Engineers

Julien Kervizic
Hacking Analytics
Published in
5 min readMar 19, 2022
Photo by Wonderlane on Unsplash

With the increased digitalization and data use cases stemming from it, the field of data engineering is becoming quite in demand. Yet more often than not, hiring managers and companies don't fully grasp the nuance of the field. There are many different data engineer archetypes, and while true generalist exists, typically, a Data Engineer will have particular expertise and affinity towards one of the areas of Data Engineering.

Datawarehouse archetype

The data warehouse archetype for Data Engineers is an archetype where data engineers primarily deal with databases; their focus tends to be on data integration and data modeling.

They primarily work with RDMS such as MsSQL, Oracle, or Postgres, know the in and out of ACID properties, transactions, data modeling methodologies such as Kimball or Inmon, optimize queries through reading explain plans, applying indices, partitions, etc. This archetype sometimes gets involved in database administration tasks, including user provisioning, backup, recovery, migration, etc.

Data Engineers typically tend to work with tools such as SQL Server integration services (SSIS), Procedural SQL, … although some companies are progressively migrating towards a modern data stack including tools such as DBT for data modeling purposes. With the move towards the modern data stack, this archetype is being pushed towards the direction of analytics engineering.

Data Integration Archetype

The Data Integration archetype typically works on bringing data onto a data platform (ETL/ELT) or from a data platform (reverse ETL).

The engineers fitting in that archetype would typically use frameworks and technologies such as Singer taps and targets, orchestration such as Airflow, Azure Data factory or Logic Apps, CDC toolings such as Debezium or AWS Data migration services, or reverse ETL tooling such as Rudderstack.

Some of the work typically done by these engineers involves calling APIs to source or push data, creating FTPs feeds, or setting up data crawlers. They might as well be quite…

--

--

Julien Kervizic
Hacking Analytics

Living at the interstice of business, data and technology | Head of Data at iptiQ by SwissRe | previously at Facebook, Amazon | julienkervizic@gmail.com