How to Crack the Google Cloud Professional Data Engineer Exam in 1 Month (October 2020)

Revannth V
Oct 26, 2020 · 13 min read
Image for post
Image for post
Image by Author

*This article includes the test-taking experience for the new at-home proctor mode. Except for the test-taking experience, the guide remains consistent for both the modes. You can still refer to this article in case you are reading this in the Post CoVid world ( oh how wonderful that must be!).

So you finally decide to do something with all the time you have! With more and more people taking the Google Certification exams, you can’t help but notice the hype and become a part of it. All you need now is a quick hack to get through the test. Well, I got you covered there!

This article is specifically targeted at people with little to no experience with GCP. Though that wasn’t the case with me since I have been working on GCP products for the past year, I shall do my best to make this guide as rudimentary as possible. (Scroll to the last section if you only want to read the experience)

The Whats and Whys of the Certification

For any exam you take, a fundamental understanding of the Certification and what it is for is essential to increase your chances of acing it. In Google’s own words, the Data Engineering Certification is to :

Demonstrate your proficiency to design and build data processing systems and create machine learning models on Google Cloud Platform.

The Certification is not a requirement. You don’t ‘have’ to be a Professional Data Engineer to get a job nor is it a pre-requisite. But I would be lying if I told you that GCP Data Engineers aren’t recognized across the globe. You do get to be part of an extremely small inner circle (approximately 4k Data Engineers across the world).

Image for post
Image for post
Copped this from a reddit thread sourced here

Anyone working with Data(Data analysts, Scientists, etc) should ideally breeze through the exam as it tests you on the fundamentals. For someone new to the cloud space, you will need to spend some additional time to grasp these fundamentals. Either way, preparing for the certification will open up avenues for up-skilling yourself in different dimensions of Data Engineering and a possible career transition (no promises though!)

Some quick details about the examination are below -

Cost of the Exam: 200$

Number of attempts: NA

Pre-requisites: NA (Though Google suggests an experience of 3+ years, it’s not a hard and fast rule)

Validity: 2 years

Resources to prepare for the Certification

Now that you are equipped with the basics of the certification, let’s jump right into the courses and the ideal resources you will require for the examination. The resources are provided in the order of importance and I have also provided a metric below to gauge the effort vs duration of each of the resources.

1. Data Engineering on Google Cloud Platform Specialization on Cousera

Cost: $49 USD per month

Recommended by Google itself, this set of 6 courses will give you a thorough understanding of the GCP platform. It provides hands-on sessions that are quite helpful in grilling the much-needed concepts and the Course-end evaluation is decent.

Though the Coursera membership will set you back a 50$, you can use the 7-day trial to finish the course for free (this approach is not recommended but it’s still an option). Post that, you can refer to the GCP Documentation to reaffirm the concepts.

You will need to spend upwards of 10 hours a week to finish this course within a month. But I have seen people finish this in under 2 weeks using the scaled video speed option. Overall, if you do have the time, this resource could be all that you need to ace the test(minus the practice tests). The course covers a lot of concepts and services that are generally not tested in the examination. The majority of the hands-on sessions are ‘good to know’ but not required for the main test.

Image for post
Image for post
Image by Author

2. A Cloud Guru(previously called Linux Academy)

Cost : 49$/Month

This resource is the fastest way to learn the fundamentals required for the certification. This course does include hands-on sessions but they aren’t as comprehensive as the first course. The course is focused on getting you through the examination and the narratives are designed accordingly.

For every service, you are provided with the depth and breadth of the service along with the key concepts tested on the exam. This helps you stay abridged with the services GCP has to offer while not losing the intent for taking the test. Hence, this resource cannot be used alone. You will need to club this with parts of GCP Documentation (links to which are provided in the course itself).

Most of the lectures go in tandem with the Google Cloud Documentation and also highlight the best practices. I would recommend this course to anyone comfortable with going into the examination with a little grey area since some GCP tools are left out of this course(for instance GKE). But on the flip side, these tools are rarely tested individually.

Image for post
Image for post
Image by Author

3. Google Cloud Platform Documentation

Cost : Free

This resource is by far the most extensive and the most time-consuming. It covers every tool/service in the way Google has designed it to be used and will expose you to a wide array of concepts that are rarely tested on.

If you could, however, complete this resource, you will not require anything else for the examination. But the sheer depth of the information can overwhelm and intimidate anyone. Hence, its recommended not to take on this resource alone but use the links provided by the previous course and only refer to those pages of the documentation.

Image for post
Image for post
Image by Author

Other noteworthy resources include: Get Google Cloud Certified by Sam Lee, Google Cloud 1-minute videos, Google Cloud Professional Data Engineer Course [2019 Update].

I haven’t personally reviewed these courses and hence my point of view is moot but the three resources provided above should be enough to get you through the test.How to prepare for the Examination?

How to prepare for the Certification?*

*For the one month guide, refer to the next section.

This exam is no different from any of the competitive examinations out there(ergo, GREs, GMATs, SATs, ACTs, etc). Hence, it is a given that without spending a good amount of time preparing for the exam, you would stand no chance of clearing it. With that said, following a proper plan should increase your chances of acing it. The exam contains 50 questions. It’s rumored that you need to get at least 35 questions right to see the ‘Pass’ prompt flash on your screen. Breaking down the exam into sub-segments, you are generally tested on the following:

  1. Ideal Storage solution (BigQuery vs BigTable vs Google Cloud Storage vs Google Cloud SQL vs Google Cloud Spanner vs Memory vs Firestore)
Image for post
Image for post
Credits : From the Cloud Guru Google Certified Professional Data Engineer Course

This is heavily tested on the course. Understanding each of the Storage solutions and when they should be used will not only help you get a few questions in your bag but will also help you eliminate a few options in consequent questions. Hence, while you prepare for the examination, ensure you pay special attention to the difference between the storage options.

2. Big Data Equivalent Services

As such you are not expected to understand the working of Hadoop or Sqoop or any other BigData tool for that matter. What you need though is to understand the BigData equivalent services that GCP provides. For instance, Hadoop H-Base tables can be replaced by GCP BigTable. The Cloud Guru course covers this in-depth and also introduces you to the BigData concepts. Hence, you don’t need to refer to any other resources to strengthen your concepts.

In most cases, a GCP solution is better than an On-prem solution.

Read and re-read the above statement. Almost always a GCP based service is better than a Hadoop based service. Don’t waste your time on the exceptions since they are rarely tested.

3. Machine Learning

Regression is for numbers and Classification is for categories

This and the high-level understanding of a few Machine Learning fundamentals is all you need to ace questions from this segment. The exam won’t test you on TensorFlow architectures or the mathematics behind complex Neural network Cost function calculations. It will, however, test you on concepts such as. :

  1. Regularisation (L1, L2)
  2. Generalization
  3. Classification vs Regression
  4. Supervised vs Unsupervised
  5. Pre-trained APIs (AutoML ,etc)
  6. Deployment and Containerisation of Models(GKE, Compute Engine, etc)

The Cloud Guru course does a good job highlighting all these concepts and you should be able to answer the questions from this segment with ease.

4. Security and IAM Roles

This is by far the most challenging segment of the examination. There are so many combinations and best practices that it can sometimes make you not ignore this segment altogether. But trust me when I say this, this segment has got nothing on you! All you need to do is follow the steps below :

  1. Learn the predefined roles for all the services. For instance, what the Developer role does for Data Flow, etc. For each service, there are no more than 3 roles.
  2. Restrict a developer/user to the least level of access required for them to perform their functions.
  3. Finally, learn the granularity of security offered. For instance, BigTable allows you to set access control over tables.

Attack this segment at the very end of your preparation cycle since the information from this segment has a shorter retention life. By the end of the 2nd Segment, you should be ready to give the exam in a month. Book your date once you finish the first two segments and go through the sample exam questions provided here to understand the type of questions provided.

How to prepare for the Examination in one month?

Now let’s say you have one month to prepare for the test. You have never worked with GCP but you want to give the test and ace it. This is a foolproof guide to do just that. Now before I go ahead with the approach, follow this if you have no other option! The GCP Professional Certification is quite flexible and you can always reschedule it for a small fee.

With that said, this approach is not for the faint-hearted! It involves dealing with a lot of statistical cop-outs which could have dire consequences. But it’s still the best chance you have to ace it in one month so read on!

The exam contains 50 questions with at least 35 questions required to ace it. You will have to follow and complete the following :

Note: The amount of time required per day is subjective to an individual. The time given is the minimum required by an average person to completely grasp that section)

  1. BigQuery ( 3 days - 5 Hours/day)

Yes, you heard me right. Start with BigQuery. Do either of the courses or the documentation but cover this tool extensively. BigQuery also has a Machine Learning service called BigQueryML. This should also introduce you to the concepts of ML and give you enough to answer a couple of questions in the test. BigQuery is heavily tested since it’s one of GCP’s flagship services. Read and learn about how BigQuery handles replications, availability, and resource allocations (flexbytes vs fixed).

2. Storage options ( 1 day - 5 Hours/day)

Spend an entire day understanding the differences between the storage options and which solution is ideal for a given scenario. You will have to refer to this throughout the month to ensure maximum retention.

3. Data Flow and PubSub (3 days - 3 Hours/day )

These two services are also tested often and most of the time in tandem. Hence, read about how these services work together and why PubSub is a better alternative to Kafka.

4. Hadoop Alternatives ( 7 days - 2 Hours/day)

Spend a good week on this with special emphasis on BigTable and DataProc. You should have a fundamental understanding of the different Hadoop tools and their equivalent services by the end of the week.

5. Machine Learning APIs (2 days - 3 Hours/day)

Use this time to understand the various APIs GCP has to offer and the use cases. In most GCP solutions, using a pre-trained API is beneficial since it saves time and doesn’t require a Data Science team.

6. Security and IAM ( 5 days - 3 Hours/day)

With less than 2 weeks left, go through this section, and understand the different predefined roles. Use the documentation and outline the best practices or just refer to the resources provided above.

7. Refer to ExamTopic and VCEGuide. ( A day or two before the test)

Both these websites have actual GCP examination dumps. You can refer to these sources and understand what kind of questions you should expect. Remember, there are only a finite number of ways you can be tested so more often than not the questions might be rephrased and tested. Who knows, if you are lucky on the D-day, you might end up seeing questions from these dumps.

My Test Taking Experience(At-Home Proctor Mode)

This section purely accounts for my experience with the At Home Proctor mode of the exam and isn’t an article comparing both the modes. So in case you are looking for that, I would suggest you click away.

In a nutshell, the Test Taking Experience was quite exhausting(mentally). I took 5 hours to complete a 2 hours test. Yes, you heard me right! And this was a collective fault of both the exam vendor and my ISP.
Before I tell you what caused the exam to go beyond time, you need to understand how the exam works. So a couple of days before the exam, you are required to register using an app called ‘Sentinel’ which takes pictures of you and remembers you as the test taker.

On the day of the test, you are required to use the same app to validate your identity along with a Government based proof validating your personal information. The app as such did not work with my internal webcam(I own a MacBook Pro 2018) and I had to use an external webcam for the process.Once I got through this, I was able to use the internal webcam for the entirety of the test. Before you get to the test though, the proctor asks you to scan the entire room and checks your surroundings thoroughly. All this takes a good 30 minutes so make sure you login in at least 20 minutes before your scheduled time.

What caused the exam to go beyond the stipulated time?

The first hour of the exam was quite smooth. I had completed answering all the questions I knew and marked the questions I wished to review. The proctor did occasionally halt my exam since the ‘squeaking’ of my chair was being flagged as vocal noise. It was a little overwhelming at the start but I had gotten used to the occasional prompts and halts. Once halted, you are required to chat with the proctor, explain yourself, and then go back to giving your exam with the wasted time added to your overall time.

For a brief minute, I lost power and everything derailed from that point on. The organization that takes your test is quite understanding about these issues and gives you a 3-minute window within which you are expected to re-login to the portal. I was, fortunately, able to do that and was prompted with my exam(with all my answers intact). But the proctor halted my test and told me that my video was no longer visible. He asked me to quit the test and rescheduled it for the next window. As alarming as this sounded, it essentially meant my exam(with all my saved answers and time) was being moved by 3 minutes.

I had to complete all the validations that were conducted at the beginning of the test again and this time within 3 minutes. I couldn’t finish it and was locked out of my test. I tried contacting the support helpline provided but to no avail. I tried reaching out to the organization using their online chat which took forever. After almost an hour of franticly trying everything out, I was finally greeted by someone on chat. Post that they were able to get me onto the test and I was able to take it from where I left off.

What you should do?

I did have a couple of hiccups after that but without going into much detail, let me provide you with a few things you should do in case you are taking the test :

  1. While configuring the Sentinel app, use the same webcam you wish to use throughout the test. Any change in webcam will officially lock you out of the test.
  2. Try using a Windows laptop because once you configure the Sentinel app with a particular system, you are bound to use that system throughout the test.
  3. In case of a power outage or sudden disconnection from the test. Do not panic. Just visit the support website provided to you and wait in the chat queue. It does take time but someone will come through!
  4. Try reducing your ambience noise as much as possible and also do not move your eyes away from the screen.
Image for post
Image for post
Image by Author

If you follow everything I wrote down to a tee, you should be able to flaunt this badge around and also become a part of a coveted network of Data Engineers!

Hope this read was fruitful. Do let me know if you want me to cover a particular concept in detail down below.

The Startup

Medium's largest active publication, followed by +754K people. Follow to join our community.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store