How I would become a pro in data engineering in 2024?

Subhayan Ghosh
8 min readMay 19, 2024

--

Do you want fly like this guy in your Data Engineering Career?

image by dall-E

Then hop on! Trust me, it’ll be worth your time. 🤗

In my last blog (here), I described on how you can get started with Data Engineering or and get an entry level data engineering job. (Have you read it? No? Please do. Else it’ll be like watching the second part of a two part movie. 😒)

But once you have finished the prerequisites as per my previous blog, I’ll tell you exactly what you would need to learn to grow in your career and become a pro level data engineer.

Let’s first discuss about different pathways you can take for the next level of your data engineering career.

In recent times, the data engineering job roles have started asking for more than just data engineering and if you analyze the jobs listed you can identify 3 broad patterns (Do you love this? I mean finding patterns in Data? Then I know exactly what you should be learning next. 😎).

But for now keep reading and I’ll put you in front of some scenarios that might help you figure out which path you can take.

IPL 2024 is currently going on, as many of you would already know about this (in India we have two religions, 1. whatever your actual religion is
2 Cricket 😜).
Now, do you know that RCB has won more matches in total than RR despite not winning any IPL? Or are you someone who loves to calculate the net run rate of your favourite team to see if they are going to qualify for the play-offs?
If you’re not a cricket fan somehow, but you’re a big fan of the most beautiful sports there is, football (Hala Madrid 🤩), and have a team in Champions League to cheer for, and you love calculating the xG of a player, you love to see how many tackles your favourite defender have completed or you love analyzing the heatmap of different goal keepers and you find that Manuel Neuer could easily play as a defender.

And in the off chance that you don’t like either of those games (seriously??? 😣) I’ll ask if you love analyzing financial statements of stocks to identify potential buys 🤯, or may be you’re the one who is creating an excel sheet for an upcoming trip with your friends to create a budget?

If these are things that resonate with you, then you’re next steps should be to learn about data visualization tools like Power BI or Tableau (honestly speaking, you can’t go wrong with either however Power BI might be a bit easier to learn).

Now the question arises, how do I learn Power BI (yeah I am biased 😎)?

  1. If you absolutely need to follow a course, then if you have access to Udemy (I mean free, from your employer), then the following course would do
    https://www.udemy.com/course/70-778-analyzing-and-visualizing-data-with-power-bi/?couponCode=LEADERSALE24A
    The reason I have added this course is that it also allows you to go for Azure PL-300 certification. It is not mandatory to do that but incase your employer can reimburse then why not?
    I would say, you book the certification from a date in a month and then start learning. It’ll push you more rather than you thinking that oh I have a lot of time, will do later.
    I don’t know about you but I am more focused whenever I am having a deadline in mind.
  2. And for those who does not have a Udemy business access, fear not, you can learn it for free as well. But here you might need to design a structure for yourself. But wait! Microsoft themselves provide a learning path for PL-300 and you do have a structure. Now what you should do is just follow the following playlist to pick up the basic and just like I explained earlier, book the date for the exam (may be 45 days after), and start learning.
    https://youtube.com/playlist?list=PLUaB-1hjhk8HqnmK0gQhfmIdCbxwoAoys&si=sR5WzXwQxfbojLrJ

Now once you’ve mastered the fundamentals, just work on a few full fledged dashboards on any topic that you would love to do, and I don’t think I need to elaborate more on this, you have to figure it out. Just that, don’t follow someone’s project blindly, do something on your own by taking a dataset from Kaggle (Yeah, at some point you have to be there and explore, it’s a gold mine 😍).

Once you’re somewhat familiar and can work on your own, you can move on to learning Data Science. However, explaining how to learn Data Science would make this blog too long so I’ll keep it parked for a later date.

Now if you’re someone, who was not really convinced on the above story, I mean, you did not feel that it is what you would do for the rest of your career, then what about doing something on the backend side? Now, if you never created an API and wondering what backend has to do with Data Engineering, then my friend it’s not what you want to learn and you can skip to the next part if you like to do something cool but requiring rigorous learning for a year or two.

If you don’t like either of these options, then go back to the previous option and learn Power BI, improve your communication skill especially story telling (This is a bonus point for those who read the entire blog 😜). And you might start with the following video).
Now if you’re reading this then, you are taking either of the two hard pills to swallow.

Which pill you’ll take will depend on the answer that pops in your mind when I ask you read the following line: “I love probability, multi-variable calculus and the below equation does not feel complex at all

I just took it from internet and don’t know what it is!!!

If your answer is “Hell no!!!, I can’t start with those again!!!”, then you my friend should swallow the pill to learn backend (Assuming you have not learnt that already).

When you start searching about how to learn backend, there are many tools that might be blinding you with options.
There is Python community who swears by Django and Flask and may suggest you to go Fast (An API 😂).
Some may suggest Node JS, and some would suggest Spring Boot and you might get asked to learn Go as well. But I’ll tell you few things that might help you choose the right one for you.
Do you want to learn the framework that is most used nowadays? If yes, skip the rest and put your foot on the Pedal and learn Spring boot.

But, but, there is a if.

Do you know Core Java, Hibernate, JDBC, Struts, Lambda expressions etc.?

If yes, then go ahead else I would suggest you to follow the following video to know about the different paths to learning Java, it’s a bit old but absolutely the best guide for a newbie (java learning path).

Now learning Java is a daunting task and it might take an year to learn all those Kaushik explained. I’ll probably make a separate post on how I learned it (Yes, I took this path 😎). However just make up your mind and keep showing up daily, it’ll get easier.
And one more reason that might help you to choose Java is microservices and Kafka. These two tools are well integrated with Java ecosystem so all that trouble will give double the return as well.
If you find it too big a task then I’d probably choose Django or Fast API instead of Node just because as a data engineer you already have experience with Python, why learn a new language for that?

Although it might be considered investing your time to learn JS if you’re willing to become a full stack developer, however, going full stack from Data Engineering might not be for everyone.

Most people prefer DevOps once they are comfortable in backend and data engineering and some choose to learn about designing highly scalable systems or may be distributed systems or systems that require very very low latency (All this is highly advanced stuff and I am not the right person, just sharing some insights after talking with gurus who have taken that path.)

At last, the only path that remains is to learn everything 😅, I mean this:

The specialty will be to become a Machine Learning engineer or an AI engineer. For this, as I told earlier, you need Deep (😉) knowledge on Math and Statistics and you would need to learn everything about algorithms that are powering today’s AI models.
This path often needs Masters or PHD (There are exceptions, but it’s a requirement in most). And for these job roles, you need to spend dedicated time for couple of years at the very least.
Although people sometimes confuse AI with Data Science, but those are completely different things and since you have read this entire blog, I’ll give you another tip: You need to learn some of Software Engineering skills as well if not all to become a great AI/ML engineer.

So, the following part becomes essential for everyone, irrespective of the path you have chosen.

Learning how to design systems (System Design).

This is a topic in itself so I’ll skip it for now but I’ll leave you with most important thing that is a must and may be you are quite surprised that I have not mentioned it till now.

Learn about a Cloud (AWS, Azure — pick one) and the services that are relevant for your path. There will be many services which might not be relevant for all.
So, once you have started with any of the paths I explained above, just search for services that are relevant for your path.

Okay, this was a loooong write up. Hope you’ve liked it and feel free to comment what you’ve liked and what, not so much 🙂.

Until next time, take care!!!

--

--