My journey of GCP Professional Data Engineer (2020)!

Amit Sharma
6 min readMar 13, 2020

--

Hi All,

Recently, passed the Google’s Professional Data Engineer Exam. This certification has many facets so it deserves some kind of sharing. I hope this might help you.

Journey

Google has official training on Coursera “Data Engineering, Big Data, and Machine Learning on GCP” to crack this exam which they keep aligned to the data engineer curriculum.

Courera chapters and study material is quite long and tiring to absorb in first go so dont feel disappointed. Make sure that you take some time out every day to incrementally progress and revise elsevery likely that most of it is forgotten, by the time all courses are finished.

You are not expected like Associate Cloud Engineer to know commands for creating machine or knowing if this is a global or zonal service. Its quite focused on data and how to implement end to end pipeline knowing fair details about each of the component. Of course, its impossible to know everything about every component involved, so where to draw a line?

Well, I will try to share my bits. First the worst one -

Machine Learning (ML) is perhaps the only area where I found that Coursera videos lack the depth, which probably is due to the vastness of this area, so dont feel content that just 2–3 videos of Coursera can help in answering ML questions. Pretty harsh, but that’s reality!

You can expect about 10% questions in the exam related to ML and in my case, they were at a different level than what I seen in any of the practice exams. I have given some links, to keep up with the basic terms and scenarios when to use which model.

Now its all manageable and predictable stuff-

In the DataFlow, apart from design principles related to windowing and advantages over other products, you can also expect questions on sample code like how to split a collection, push to sink or read from side stream.

One expectation is that you should know when to choose which product if multiple options are available based on some data point and that would be your hint! E.g at what data size for transactional data, you would go from Cloud SQL to Spanner, if everything else is same.

Tip: There are number of flowcharts you will see during courses, so as you practice them will be easy to remember. They come very handy, dont forget no papers are allowed!

Likewise, there are specific use cases for Bigtable and BigQuery and there are best practices to remember. Bigtable has atleast 7 best practices to follow and you can expect question one way or other to tease you.

Security Model, is explained in quite detail in the course videos for all involved products like PubSub, Storage, Bigtable, BigQuery and other, so its quite sufficient. But yes, do know when to use which role as there can be few questions easily on it and Google follows principle of least privilege.

Dataproc, has two aspects. One its to support legacy so how to size it correctly and connect/ migrate to GCP. Second, that it has a decent echo system, so try to understand what is the role of each component and where do they map in GCP.

There are many more areas like Composer, Stackdriver and DataPrep which are fairly covered in the Cousera. However, you may explore them more as you like.

Useful links and Dumps

Coursera’s Data Engineering Specialization: Its a 5 course specialization by Google for this exam. Must do all courses, associated labs and challenge labs, if any. These will help in building the foundation.

GCP Documentation for the relevant products covering atleast following:

  • Design Best Practices
  • IAM Security
  • Failover and High Availability
  • Migration from on prem to cloud
  • Data Transfer from another cloud
  • Decision Flowchart
  • Use cases

Linux Academy: There are both exams and course available, I didnt opt for the course so can’t comment. However, “Data Engineer — Practice Exam” on it is at par with GCP exam, so highly recommended. It will help you knowing which areas to focus more! Only shortcoming is that its just 1 exam, wish there were more.

Whizlabs Professional Data Engineer Exam: It compensates the Linux Academy by providing you 4 different exams! Each option is explained in detail with corresponding GCP links. Only advice is keep few days gap between each try so that you don’t remember options to maximize the benefits. That way, brain will be forced to think rather just fetch from memory.

Data-Engineering-on-GCP-Cheatsheet

Telegram App: There are many groups, where people discuss questions and dissect each option quite passionately. This way its quite easy to relate and dont even need to memorize.

Exam Day

In the exam center, they dont allow you to keep anything with you — mobile, wallet, watch, water bottle. Nothing! Of course, all sort of refreshments may be available depending on your exam center. I did two exams at different centers and both were quite opposite. One was sound proof cabin and other was a hut in middle of nowhere with wind gusts touching 80 KM/h and you can feel it.

Once you are on terminal all these surroundings kind of dont matter much. Here you are alone in front of a screen with no paper, pen or mobile!.

So when my exam started, within first 30 mins, it was clear as day to me that its gone (given the complexity of questions and my confidence) around ~Q- 15. So pretty much two options, either continue the horror or quit and try another day, better prepared (ususal excuse!).

I took a washroom break (remember, clock doesnt stop!) and decided to continue with the ordeal. No harm, even if fail as you will be bit more wiser and know what to expect next time.

Tip : Keep marking questions where you are not sure, as these might help in revision!

By the time I reached Q 40, it was almost 1 hr and 15 mins gone and 10 more to go. There was no respite in the attack, and I was trying my best to hold the position.

Tip: Don’t loose heart, some questions in between will come, where you some how will know that your asnwer is correct and it boosts your confidence!

Once all 50 were done, I had about 20 questions marked for review.

Here is an interesting observation: for some questions, you will be in doubt. However, down the line there would be questions which will clarify your doubts either because of options or description. So you can use this know how to verify previous questions and rectify, if needed.

After doing revision, I changed about less than 10% answers, so here is the learning.

Tip: Believe your instinct the first time, very likely this would be the correct choice.

I had about 15 mins left so thought: have I given it all or is there some more energy/knowledge left that I can use to change my answers. Well, this thought helped me to go through all answers once more!

At this point I had changed less than 10% of total answers, about 5 Qs so it pretty much reached the dead end and pressed SUBMIT

Google doesn’t leave you there!

They will ask you for feedback and another screen for some more :)

And then the moment of reckoning, just 1 line:

Result : Pass

At the end, it was quite an exhausting and fulsome experience both the exam and writing this small memoir. If this blog can help even one person to some extent in clearing this exam, I would consider that it has achieved its purpose.

Do let me know if you need any specific information.

Good Luck!

-Amit

--

--