Google Cloud Professional exam preparation

There has been a lot of interest in the Google Cloud Platform (GCP) certification tracks for data scientists. It is very useful materials for data scientist to take advantages of cloud platfroms and run machine learning models

To take immediate advantage of Google Cloud Platform, google offers Google Cloud Professional certification for Data Engineers.

GCP Enables data-driven decision making by collecting, transforming, and visualizing data. The Data Engineer designs, builds, maintains, and troubleshoots data processing systems with a particular emphasis on the security, reliability, fault-tolerance, scalability, fidelity, and efficiency of such systems. Course track recommended for certification:

Google Cloud Platform Big Data and Machine Learning Fundamentals

This course introduces participants to the big data and machine learning capabilities of Google Cloud Platform. It provides a quick overview of the Google Cloud platform and a deeper overview of the data processing and machine learning capabilities. This class showcases big data solutions on Google Cloud, how easy it is to use, and gets them excited about what they can do with it. This course is intended for data analysts, data scientists and business analysts. It is also suitable for IT decision makers evaluating Google Cloud Platform for use by data scientists.

Data Engineering on Google Cloud Platform — in person and virtual

This course provides participants a hands-on introduction to designing and building data processing systems on Google Cloud Platform. This course is intended for experienced developers who are responsible for managing big data transformations including: extracting, Loading, Transforming, cleaning, and validating data, designing pipelines and architectures for data processing, creating and maintaining machine learning and statistical models, and querying datasets, visualizing query results and creating reports

Steps to compelete the course get the certification for Google Cloud Professional Data Engineer Track:

Step 1: Google Cloud Platform Fundamentals: Big Data & Machine Learning — https://cloud.google.com/training/courses/data-ml-fundamentals

Step 2: Data Engineering on Google Cloud Platform — https://cloud.google.com/training/courses/data-engineering

Data Engineer certification registration: https://cloud.google.com/certification/data-engineer

Keep in mind that The Coursera material is NOT sufficient to cover all the material on the test. Exam is very conceptual and high level, and it is needed to understand of how different products works conceptually behind-the-scene. There will be a number of terms (usually around the Hadoop ecosystem or general database terminology), that would not have been covered in the online material.

There will be 2 case studies in every exam: one will be the sample case study available on GCP website, and the other one will be a new one (different people might get a different case). There will be 3 or 4 questions relating to each case study.

The Google documentation and best practices/FAQ pages on their particular products like BigQuery, Dataflow etc. seemed to be useful both for providing equivalence to the general Hadoop ecosystem and discussing the most cost-effective solutions (most cost-effective solution is a big theme thought the test).

Knowing the difference/use-situations between views/tables/datasets on BigQuery might come in handy since I feel there were >5 questions regarding whether which one was the best option for a given situation.

Quite a few questions on ‘access’ management, i.e. what to create if only a subset of people should see some info, how to show aggregates without exposing the underlying dataset etc. Probably useful to know different account types like service account and what flexibility they offer in providing group/individual access to data

Definitely know storage types for each situation, i.e. Transactional data goes in SQL/Datastore, BigTable/BigQuery for analytic data etc. Atleast 5 questions on the most appropriate storage type for different types of data.

Exam Format

  • 2-hour, 50 multiple choice questions (select the best answer; or select applicable 2 or 3 answers)
  • Scenario based questions, for example:
What’s the best product to use/best option to proceed if the company is expecting increasing data volume without increasing cost/budget?
Given the task/requirement, what is the best product/model/technique to use or what is the best solution?

What to review before the exam?

  • Hadoop Ecosystem and on premise Apache open source system/server (e.g. Kafka, Cassandra, Jenkins, etc.)
  • Google Cloud Products, especially BigQuery, BigTable, Dataproc, Dataflow, Streaming, BigTable Row index design
  • Data science/engineering common sense

What will not be heavily tested in the exam or not very useful in preparing for the exam?

  • Syntax/Coding/Labs
  • Official Certification Exam Guide

Experience & Strategy

  • schedule ahead, Some test center has open schedule over weekend.
  • Read carefully and read fast.
  • Use “Mark this question” function and come back to review the question and answer later.
  • Be familiar with the Sample Case Study (“Flowlogistic Case Study”) before the exam, but hold off reading the story until you see related questions.

Appendix

Codelabs

https://codelabs.developers.google.com/?cat=Cloud

gcloud Overview

https://cloud.google.com/sdk/gcloud/

gsutil Tool

https://cloud.google.com/storage/docs/gsutil

Cloud Dataproc Initialization Actions

https://github.com/GoogleCloudPlatform/dataproc-initialization-actions

BigQuery: Quickstart Using the Web UI

https://cloud.google.com/bigquery/quickstart-web-ui

BigQuery: Quickstart Using the bq Command-Line Tool

https://cloud.google.com/bigquery/quickstart-command-line

BigQuery SQL Reference

https://cloud.google.com/bigquery/docs/reference/standard-sql/

BigQuery Client Libraries (Python, Java, etc)

https://cloud.google.com/bigquery/docs/reference/libraries

TensorFlow Tutorials

https://www.tensorflow.org/versions/r1.2/tutorials/

TensorFlow Learn API

https://www.tensorflow.org/api_guides/python/contrib.learn

TensorFlow for absolulte beginners

https://github.com/kazunori279/TensorFlow-for-absolute-beginners

TensorFlow Examples

https://github.com/aymericdamien/TensorFlow-Examples

Cloud Machine Learning Engine Documentation

https://cloud.google.com/ml-engine/docs/

Apache Spark

http://spark.apache.org

Google Cloud Certified Data Engineer — Beta Exam Report

http://priocept.com/2017/01/23/google-cloud-certified-data-engineer-beta-exam-report/

Google Cloud Platform Data Engineer Certification Review

https://youtu.be/STyOXlAfg1M

How to Prepare for the Google Cloud Architect Certification Exam

http://blog.brainlounge.de/memoryleaks/how-to-prepare-for-the-google-cloud-architect-exam/