Would you hire a Python/Airflow expert OR a Database/SQL expert for a Data Engineer position?
In my team (and on the market) the job is shifting more and more, let’s look at that more closely!
1 — The Data Engineer and its ancestor role
2 — Software craftsmanship DataOps
3 — Data after all
1 — The Data Engineer and its ancestor role
We are in 1990, the sun is shining and somewhere, someone realizes: “We have no Data Management, John !”. And BOOM, Business Intelligence was born.
Business Intelligence engineer
Business Intelligence practices are a way to organize Data and make it understandable for reporting purposes. More precisely, a BI engineer manages the ETL processes (Extracting the data from somewhere, Transforming this data and Loading it elsewhere). The Data is rebuilt so that it can be used for analytics purposes (KPI, reporting, Data Mining). Sometimes BI engineers even build the reporting themselves.
The code is simple: there is (in most cases) no code! They use click-button tools that do the job for him: Datastage, Informatica, Talend etc. At the same time, they write SQL all day long, manages his database and reports. Data is a “business” thing.
The new dawn of Data is here
It’s 2010, Big Data is comiiiing! This is HUGE. It’s not Data, it’s DATA. It comes as a yellow elephant in a well organized ecosystem, it challenges everything. Within a couple of years, you realize that:
- You need to code to get that Big Data: Java, Scala, Python
- ETL tools are very expensive and are being challenged by Airflow (and other tools), an “infrastructure as code” tool that can manage workflows in a better way for free.
In five years, Airflow becomes the #1 tool for ETL. Depending on the Data use-case (with or without Airflow), you should probably code in Scala or Python… BOOM, Data Engineering is born!
ETL building went from SQL focused people with a business acumen to developers with infrastructure and SQL knowledge.
2 — Software Craftsmanship DataOps
I hope everyone wants to be a craftsman, whether they are a BI engineer or a Software engineer. Let’s hear the manifesto one more time because we love it:
As aspiring Software Craftsmen we are raising the bar of professional software development by practicing it and helping others learn the craft. Through this work we have come to value:
Not only working software, but also well-crafted software
Not only responding to change, but also steadily adding value
Not only individuals and interactions, but also a community of professionals
Not only customer collaboration, but also productive partnerships
That is, in pursuit of the items on the left we have found the items on the right to be indispensable.
Scale with craftsmanship
As a Data Engineer, you are building the ETL with Airflow, SQL, a Database, maybe some event Data management with streaming.
Wait! Did I say you do SQL (just like a BI engineer, so my demonstration may be wrong)? Well, yes and no… You do it for now but not for too long! Your goal is to build a Data Architecture around tools that will be used by the whole company. Indeed, at the beginning, you need to do SQL to set up the basics of your database. But your long term goal is to build a Data Architecture that will be used by the whole company. More and more, your role shifts from SQL-doer to Data facilitator. You cannot be a bottleneck every time someone wants new Data, so you have to help them manage their Data needs. As a true Software Craftsman, you build a clear and maintainable architecture, and make communities learn how to use your infrastructure easily. In the end, tech teams will add the Data they want to analyze in your Data Lake, Analysts and Data Scientists will modify your ETL to create the Data they want.
So your main focus is now to build and maintain an architecture. You use Terraform, an “infrastructure as code” tool to deploy your whole architecture through code. You also need to monitor your infrastructure in real-time as you are accountable for it. Because you love it, you may also manage the access to this platform. Oh wait… Am I becoming a DevOps?
3 — Data after all… But differently
Let me rephrase what I’ve previously said: Data Engineers now, do less and less (sometimes no more) databases or SQL-oriented jobs, so they are more and more Tech Engineers. They also tend to become DevOps, because they are managing infrastructure. Breaking news: I must recognize that you are still a Data person. Indeed, if you weren’t doing Data stuff, you would probably have used something else than Python or Scala. You wouldn’t use Airflow either. And finally, you are not DevOps, you are DataOps, only playing regularly but not always, with DevOps toys. So yes, you are still Data! We can even say that BI engineers evolved in a more efficient form by becoming Data Engineers.
This move where Data Engineers are going away from the old school Data is understandable. Databases are becoming serverless/on-demand/infinite-auto-scaling, they seem easier and less critical to manage. However, be careful with this “natural” move where you move further from Databases and SQL. It can lead to more expensive architecture in terms of money and usage. So don’t be too much lured by the code and infrastructure importance, or by those auto-managed-thingy promises. Still, the shortcut is understandable as you may have other priorities than architecture optimization.
In the end, it’s funny that in 10 years, the “Data” in Data Engineering is no longer referring to Data stuff directly (Data modeling, Databases and SQL) but more about tools to manage it, and that’s why I wanted to share this blog post with you.
To those of you who read this article till the end, thanks for reading! And …
If you want more technical news, follow our journey through our docto-tech-life newsletter.
And if you want to join us in scaling a high traffic website and transforming the healthcare system, we are hiring talented developers to grow our tech and product team in France and Germany, feel free to have a look at the open positions.