Passing the (new) Google Professional Data Engineer exam within 7 weeks
My experience with training materials, obstacles, and what I would have done differently in hindsight
Having worked in IT for a couple of years now, mostly on infrastructure projects, I miss playing my other card (PhD in physics) — and would like to do something more data-related.
To prepare for this, I started listening to the really instructive machine-learning course called fast.ai. However, at least in Germany, it is always nice to have some document that proves that you have done something, so I aimed going for a Google certification as “Professional Data Engineer”.
Concerning investments: I started the courses at the beginning of March 2019 and took the exam on 2nd of May, spending about 7 weeks with the course and roughly 350€ on fees for instructions and the exam.
Getting the certification means passing a two-hour multiple-choice exam, and since they are not giving these certifications away for free, you have to prepare for the exam.
On the site presenting the certification, Google offers several types of training: There are instructor-led classes as well as online courses done by Coursera. I did not want to wait two months for a block course to start so at the beginning of March, I subscribed to Coursera, so that I could take the exam earlier. Starting in March led to a surprise later on, but I will cover that further below.
The Coursera curriculum (I will call it the main course from now on) is divided in five weeks, all courses are well-structured, they consist of
- video lectures,
- practical parts,
- reading parts,
- and occasional quiz questions in between.
The instructors are curriculum developers at google, Lak Lakshmanan and Tom Stern. I found both easy to understand and especially liked the subtle humor that Lak uses to pep up his lectures.
The practical parts, called “labs” are exercises that you can do in Qwiklabs. Each lab provides you with a cloud project equipped with the infrastructure necessary for doing the exercise. That project only lives for a limited amount of time (usually one or two hours), but you can repeat the labs as often as you want.
After completing the lab tasks, I usually had some time left that I sometimes spent on exploring functions and features of the cloud console, or on trying to break things — I’m an experimental physicist. That’s our way of understanding the world…
While looking at the videos and going through the labs was entertaining and the results were quite impressive, I more and more often started to wonder what was the practical application of several of the things that I was presented with. Probably I was not paying attention to the stuff that mattered, so I needed some way to check on this.
Therefore, after having done three of the five Coursera weeks, I booked the supplementary course “Preparing for the GCP Data Engineer Exam” (I will call that one the repetition course from here on). This turned out to be a good idea because I stumbled on two things that really got me upset:
- When starting the repetition course, I was informed that the exam outline had changed — after I started with the main courses. I was really worried that the videos that I watched before were now useless. Luckily, they basically just re-arranged the sequence of subjects and removed the case studies. However, from then on, I was quite suspicious about the exam — what if they would change it again for the end of April while I was aiming for taking an exam on 2nd of May?!?
- When I watched the chapters covering the parts that I already did in the main course, hardly any of the stuff seemed familiar. However, when I wanted to add all the new and important things to the notes that I had taken while watching the main course, I noticed that I had already put them on paper before. I just needed to repeat what I had learned in order to make it stick.
For doing this, I used an “algorithm” that I discovered when learning Dutch some years ago, also with an online platform:
- Before going to sleep, learn new things.
- In the morning, repeat what you have learnt the day before
After spending one week for repeating the things covered in the course weeks 1 to 3, I resumed and finished the main course which lead me to preparing for the actual exam.
I have altered the deal. Pray I do not alter it any further.
Remember my worries about the exam changing? After finishing week 5 of the main course, I wanted to subscribe to the exam that was supposed to take place on Thursday 2nd of May in Munich.
I planned to be absent from my project on the days prior to the exam and use the time for studying. My employer, MaibornWolff, actually backs me in doing so and in addition to paying for the exam fees also -at least partially- pays the time that I spend on learning new stuff. One of the reasons why they regularly win the Great place to work competition.
Saturday to Wednesday for intensive preparation, repeating labs and practice exams. What could possibly go wrong?
Turned out, they had cancelled that exam and offered one on 29th of April instead. Duh.
Luckily, there was another institution rather close by in Nuremberg that still offered an exam on 2nd of May. I decided to use Saturday to find out whether I was confident enough to take the exam on Monday or invest more time and travel to Nuremberg on Thursday.
Google is not very verbose about the contents of the exam. One of their recommendations is that you should feel confident with the touchstone features they present in the repetition course. If you want something more solid, it boils down to “50 multiple-choice questions, 2 hours time” and the exam guide giving you some rough outline about the topics but no information about how many questions you need to get right.
However, searching for information about the exam, I discovered some practice exams at different places:
- one practice exam inside the repetition course (it shows up twice, un-graded and graded but when I was doing the course, both versions had the same questions)
- the official one from Google
- another one, offered by whizlabs, a third-party company
In this situation I was a bit worried to train too much on the exam materials and doing some kind of over-fitting myself, thus after finishing the second half of the repetition course, I picked the whizlabs exam to see where I was standing and then decide on my further steps. I got 65% of the questions correctly. There is rumours that you need 70%, but nothing confirmed. No exam on Monday but four additional days for preparation.
My program consisted of these steps:
- This blog post by Daniel Bourke, that he published just in time when I started intensively looking for information about the exam. Although he took the pre-April version, his post was very helpful.
- Learn SQL This is a hands-on tutorial for SQL. If you are not that familiar with SQL, I really recommend that one. Up to then, my contacts with SQL were mostly about creating and deleting databases and tables when maintaining my personal Nextcloud instance.
- The notes that I took when watching the courses. Do write notes and use multiple colors for condensing information.
- Re-watching the repetition course once again, paying special attention to phrases like “This might be interesting for the exam”. I bookmarked those passages for a last re-watch. When I discovered something that was still “new” to me, I immediately did one of the two following steps:
- If “new” stuff was covered in the main course, re-watching that part or watching the videos where they discuss the labs.
- If “new” stuff was not part of the course, looking it up in the “quickstart” part of Google documentation (e.g. here) I did not actually do the tasks, just skimmed through the instructions to get an idea on what this component can do.
- The “Quest” offered in Qwiklabs. The part about BigQueryML was cool and not covered in the main course.
- There are several PDF-based material collections like this one here. At the end, I wrote my own crib, but on paper.
Somewhen in between these steps, I took the Google practice exam, scored 92%, repeated it and got 100%. At the end I took the practice exam that was included in coursera. Although I immediately scored 100%, I was a bit suspicious, because it was nearly identical to the Google practice exam. Not really a test that gives you additional information, at worst over-confidence.
I remembered having heard several times that the material presented in the courses would not entirely cover the exam and that additionally, you needed practical experience with the Google cloud platform (which I did not have before starting the exam).
However, at that point, I did not get anything new from the courses and my other materials, and I did not see how to further prepare, at least not with reasonable effort. Therefore, on Thursday I took the train to Nuremberg, determined to take the exam and, in case of failure to blame it on the exam having changed and the course not being adapted yet ;-)
After about 75 minutes, I had gone through the 50 questions for the first time and took a second look at the 30 questions that I had marked for review. Most questions covered topics that now were familiar to me but some addressed things that I am sure to have never heard in the courses. I had to take an educated guess on these.
10 minutes before the time was over, I was through with the second iteration and started doing random checks on the answers.
After hitting the submit button I got a text form asking me about my experience with the exam, whether I would recommend it and what they could improve. Seriously? Has anyone ever answered to that and not just hit the “Skip” button thinking “Don’t bugger me, just show me the result NOW!”?
Being in Nuremberg, I celebrated passing the exam with some traditional Nuremberg dish :-)
What would I have done differently?
Start validating yourself earlier, by taking the repetition course already after the second week. I would expect the material that you have to cover up to be fewer then.
Use the same account for signing up in Coursera and in Qwiklabs. When signing up for Coursera, I used my email address and in Qwiklabs, I signed in using my Google account. Currently, Coursera offers some Qwiklabs credits for passing courses but sends them to a Qwiklabs account that, in my case did not exist so that all the credits are lost now.
Thanks to MaibornWolff GmbH for offering me an environment where learning new things is encouraged.
The Google cloud certified logo is, of course, from Google.