The Ultimate Hack to passing Google Cloud Professional Data Engineer Certification Exam (2019 Oct)

Kelly Sun
9 min readOct 27, 2019

--

So, why the Google Cloud Professional Data Engineer Certified exam?

The cloud is the destination for your big data and machine learning projects. In the shift to the “AI-first” world, the cloud makes it easy for individuals and businesses alike to experiment with artificial intelligence solutions. Whether you want to grow as a cloud/machine learning professional or just want to learn more about how artificial intelligence works, the Google Cloud certifications are a great tool/qualification to help you advance on this journey.

Google’s one-liner sums it up.

Demonstrate your proficiency to design and build data processing systems and create machine learning models on Google Cloud Platform.

I passed my exam in October 2019 with two months of studying — I’m neither a coder by day, nor do I have the 3+ years experience recommended by Google to pass this exam. That’s why I studied extra hard for this exam, and to some extent I think harder than necessary. You really just need to study right for this exam. That’s why I decided to collate some of my studying experiences to help you achieve targeted studying. Don’t get me wrong, this is not a shortcut — it doesn’t make this exam easier, just helps you increase the probability of success. If you’re already familiar with the basics, scroll down for ultimate hacks to passing the Data Engineering exam.

What do you need to get ready for the exam?

If you’ve been thinking about getting certified in Google Cloud, but not sure if you’re qualified, you must know a couple of things. First, you absolutely don’t need to be a software engineer, but you need to have some background in coding — you need to at least understand how coding works, what is object-orientated programming, how computer systems work, and the basics of the system development lifecycle, from code design, staging, to deployment. Second, this is not one of these pick-and-go exams you can easily ace. There will be a ton of content and theoretical knowledge that you need to take in, digest, and memorize. on top of that, you need to blend that with practical knowledge because all the questions on the exam are in the form of a case study. Lastly, the Professional Data Engineer exam has a recommended 3+ years of industry experience and 1+ years designing and managing solutions using GCP — and as mentioned, I have none of these. Now don’t get me wrong, the exam is definitely not easy, and the questions are extremely tricky. But at the end of the day, it really boils down to your level of understanding of each GCP product, and the ability to break down the problems in the exam to solve for the actual problem they’re asking for.

What’s the format of the exam?

The exam will be in the form of 50 multiple choice questions, and you will need a passing score of 80%. Google will not tell you what your result is — only if you “passed” or “failed” — without any additional comments or feedback. Questions are not easy, each questions in itself is a case study or use case that you will need to solve for. The answer usually entails designing a solution that includes a combination of GCP products. Hence it is very important to understand how each product is used, when they are used with one another, and how data is transferred between different products.

How much does the exam cost?

The exam will cost $200 USD, and is only valid for two years. After that, you will need to retake the exam to keep your qualification. But — if you built a portfolio for yourself within this time, your experience will qualify you more than any certification.

If you fail the exam the first time, you will need to wait two weeks before retaking it, and you will have to pay again. If you fail the second time, the wait time increases to 60 days and the third time you have to wait 365 days before attempting again.

How did I prepare for the exam?

In the remainder of this article, I’ll be sharing how I prepared for my GCP Professional Data Engineer certification exam over the course of two months, all the resources I consulted, and the preparations I went through. I’d like to note that cloud is not always about big data and machine learning, most companies nowadays use cloud for hosting, data warehousing, or managing IT systems. However, especially in the recent data engineer exams, Google is huge on machine learning. There are easily 40–50% questions on ML (from the basic theories of what is L1/L2 regulation, cross-entropy, etc.; to company use cases in specific scenarios, e.g. would you custom train a AI-model or use one of GCP’s pre-built models on AI platform, etc.)

1. I started with Coursera: Data Engineering, Big Data, and Machine Learning on GCP Specialization:

Time needed: about 4+ weeks on a 20 hours study week
Readiness for exam upon completion: 40%
Price: $49 USD/month with 7 days trial

This course is a good introduction to the Google Cloud Platform, what products are out there and what they do. There is a combination of presentations, demos, and hands-on labs, participants will learn how to design data processing systems, build end-to-end data pipelines, analyze data and carry out machine learning. The course covers structured, unstructured, and streaming data. Coursera alone is definitely not enough to pass the exam — do not just rely on the training materials and Coursera, but use your own experience in developing via the hands-on Qwiklabs.

And mind you, there is a lot of info in this course, especially for someone new to GCP. I actually stopped mid-way just to take a breath from the sea of information, and found another course that gives you a comprehensive overview on all the products in a much shorter time span. See below.

2. I then found and took Sam Lee’s Google Cloud Professional Data Engineer Express Course:

Time needed: 4.5 hours
Readiness for exam upon completion: 20%
Price: CAD $24.99/lifetime access with community support

This course is legit right under 5 hours, and walks you through all the major products you need to know for the exam. It’s a short and sweet resource to give you a big picture of what you need to know for each product on the exam. I’m a big-picture kind of person, so going through this course really helped me gain a perspective of what exactly I’m going into, before I dig down to the details.

3. I then finished the Data Engineer Course on Linux Academy by Matthew Ulasien

Time needed: 20+ hours
Readiness for exam upon completion: 80%
Price: $49 USD/month with 7 days trial

I must say that this is a phenomenal resource. There is a combination of 73 videos, 6 hands-on labs, and 7 practice quizzes/exams in this course, and note that some of the practice questions in this course actually appears on the exam. This course comes with the Data Dossier eBook (essentially the collated course materials) which is basically like a in-depth cheat-sheet for all the systems.

I would say these three resources are enough to provide you with most of the content you need to know for the exam. However, do not just rely on the videos — especially if you don’t have much experience with Google Cloud — the only way you can cover up for that is really getting more hands-on, practical experience through the Qwiklabs. In fact, Google Cloud Platform gives $300 free credit for anyone who signs up to use its products. Leverage the playgrounds as much as possible to really get a feel of what it’s like to work within the environment because you do get a lot of operational questions.

And now to the ultimate hack that helped me ace the exam. If you do exactly the actions I outlined in the “how to solve” section below, you shouldn’t have a problem passing the exam. After going through all the materials and sitting for the exam, I feel there are really just three types of questions, and hence three types of approach you need to take in order to tackle these questions:

  1. Product Functionality Questions (about 20–30%)
    Example exam questions: your company is streaming loT sensors, which products would you use to design a data pipeline? You need cloud storage systems that support JSON and ANSI SQL, which product should you use?
    How to solve: Going through all the courses on either Coursera or Linux will be more than enough to help you get to this level.
  2. Product Operational Questions (about 30–40%)
    Example exam questions: if visualization data is not showing in Data Studio, what should you do? Which proxy do you use to access the YARN web interface? This query returns an error, what should you do to fix it?
    How to solve: the core of this type of question is “best practice”. There are two parts to solving this. One is playing with the platform, interacting with it, and know the settings and toggle options. Don’t worry, you don’t have to spend days doing Qwiklabs. Just watch all the hands-on videos on Linux Academy in detail, identify the systems that you are less familiar with or find more complicated, and do the Qwiklabs for these. If you’re time-bound, you really don’t need to go through every lab. I didn’t touch the Coursera labs at all. Additionally, do practice exams while going over the GCP documentation to understand what GCP’s recommended best practices are. Here are some examples of documentation:
    BigQuery best practices
    Stackdriver summary
    BigTable and understand how to design a good schema
    IAM to get familiar with the basic concepts and roles
    A great list of resource dump by Ivam Luz
  3. General Machine Learning Questions (about 20–30%)
    Example exam questions:
    what parameters are adjusted by a neural network during training? Your company wants to predict stock prices, what ML model would you train? Shown a graph of data points, what equation do you need to cluster them (e.g. cos(X) or X²+Y²)?
    How to solve: training material ain’t gonna cut it. Some of these questions are not on Linux or Coursera. You need to do your own research. Don’t fret about this too much — you’ll likely be able to answer most of these questions by learning about how TensorFlow, GCP’s AI Products, and pre-trained models work. Here are some focus areas that appeared on my October 2019 exam which might be helpful:
    How to fix over/underfitting
    Know what dropout methods are
    Understand L1/L2 regularization
    Understand synthetic features
    Google Machine Learning (ML) APIs
    Google Cloud Machine Learning Engine
    Google Cloud TPUs
    Google Glossary of ML terms

How do you know when you’re ready for the exam?

Comfort. Imagine you sitting at the exam and you got a question on BigQuery. Then you got a question on Pub/Sub. Then one on machine learning. How nervous do you feel? If you feel okay because you have a good grasp of these, you’re most likely ready for the exam.

Write the practice exams. I can’t stress this enough — the practice exams are very, very important. I personally went through the practice questions on Linux and Google again and again to make sure I intimately understood the answers, and rewrote the exams until I scored 95%+ every time. Here’s a practice exam questions package I found on Udemy which I also found helpful (some questions did appear on the exam in a similar fashion).

Side note, when the question gives you two options, GCP product vs. non-GCP product, the answer is usually to select the GCP native solution. Always choose the lowest-cost option unless stated otherwise.

If you want, here are also some other resources that I know exists but personally haven’t explored:

The actual exam is two hours in duration and it took me just about that time. I would say if you’ve done all the targeted studying I outlined above, and went over the practice exams, the actual exam shouldn’t be so difficult. You receive your results almost immediately after the exam (you’ll need to check webassessor if you didn’t receive the certification email, because the certification sometimes take up to a couple days to process).

If you have questions about the exam, feel free to shoot me a message on Twitter (@sodiumsun) or LinkedIn. If you’d like a copy of the Udemy practice questions (which I paid for on Udemy but happy to share for free), you can also message me or download it directly from here.

--

--

Kelly Sun

Strategy Consultant at Omnia AI, Deloitte Canada’s Artificial Intelligence practice. Living at the conjunction of business and tech.