What is a Data Platform Engineer

How I got interested in Data Platforms

Karan Gupta
3 min readJun 17, 2020

I had heard about big data technologies like Hadoop and Spark during my college days but just didn’t pay too much attention to it. Then I started a new project at work involving Spark and coincidentally discovered Andreas Kretz’s youtube channel. His description of the field really stuck with me — “Plumbers of Data Science”. I also got to see first hand the importance of using solid engineering to build the data backbone for any sort of analytics, business intelligence or machine learning endeavour.

Path of a Data Platform Engineer

I have been working full time as a Software Engineer for about a year and have surprisingly got exposure to a multitude of fields. But what I found was whatever I was learning was very haphazard and I could not construct a complimentary skill set out of technologies I was working with. So since I have now become mainly focused on a Data lake implementation I have come up with a niche / speciality for me to purse which is data (platform) engineering.

Some of the key foundational concepts in not just modern data engineering but in computer science are data structures, algorithms, database design, system design and data warehousing. To implement these concepts we would need a programming language and a relational database. Combining all of these form the base of a data platform.

We also have to acknowledge that we live in a modern world where big data and cloud being used across companies from big enterprises to even startups. What I have seen as well is that in multiple data engineering case studies is that some part of there data platform always involves Web APIs mostly for integration purposes like data ingestion from source system, or for transfer data from data lake to data warehouses or data marts.

Machine learning is probably the next frontier for most data platforms and so an ML engineer I believe can be the natural progression of a data engineer. Most people, when they think of ML think of data scientists munging data , using cutting edge algorithms and evaluating metrics to create a sophisticated model and I can assure you the reality can’t be far from it. Any model no matter how fancy have no value in industry until it makes it into production. So it falls upon the data engineer now upgraded to a ML engineer or a data platform engineer to take the model and build production ready pipeline for serving the predictions and automated retraining. This requires a good foundational understanding, not the most advanced know how of ML.

Where to go from here

Also some additional skills I think are an added bonus for any Data Engineer would be having experience with a Lucene based search engine like Elasticsearch or Solr, CI/CD tools like Teamcity or Jenkins and Git for source control.I hope this article gave you some clarity into the roles of a data engineer / database engineer / big data engineer / data platform engineer etc. I will be posting more content on topics related to big data and data engineer so stay tuned.

--

--