Google Cloud Professional exam preparation
There has been a lot of interest in the Google Cloud Platform (GCP) certification tracks for data scientists. It is very useful materials for data scientist to take advantages of cloud platfroms and run machine learning models
To take immediate advantage of Google Cloud Platform, google offers Google Cloud Professional certification for Data Engineers.
GCP Enables data-driven decision making by collecting, transforming, and visualizing data. The Data Engineer designs, builds, maintains, and troubleshoots data processing systems with a particular emphasis on the security, reliability, fault-tolerance, scalability, fidelity, and efficiency of such systems. Course track recommended for certification:
Google Cloud Platform Big Data and Machine Learning Fundamentals
This course introduces participants to the big data and machine learning capabilities of Google Cloud Platform. It provides a quick overview of the Google Cloud platform and a deeper overview of the data processing and machine learning capabilities. This class showcases big data solutions on Google Cloud, how easy it is to use, and gets them excited about what they can do with it. This course is intended for data analysts, data scientists and business analysts. It is also suitable for IT decision makers evaluating Google Cloud Platform for use by data scientists.
Data Engineering on Google Cloud Platform — in person and virtual
This course provides participants a hands-on introduction to designing and building data processing systems on Google Cloud Platform. This course is intended for experienced developers who are responsible for managing big data transformations including: extracting, Loading, Transforming, cleaning, and validating data, designing pipelines and architectures for data processing, creating and maintaining machine learning and statistical models, and querying datasets, visualizing query results and creating reports
Steps to compelete the course get the certification for Google Cloud Professional Data Engineer Track:
Step 1: Google Cloud Platform Fundamentals: Big Data & Machine Learning — https://cloud.google.com/training/courses/data-ml-fundamentals
Step 2: Data Engineering on Google Cloud Platform — https://cloud.google.com/training/courses/data-engineering
Data Engineer certification registration: https://cloud.google.com/certification/data-engineer
Keep in mind that The Coursera material is NOT sufficient to cover all the material on the test. Exam is very conceptual and high level, and it is needed to understand of how different products works conceptually behind-the-scene. There will be a number of terms (usually around the Hadoop ecosystem or general database terminology), that would not have been covered in the online material.
There will be 2 case studies in every exam: one will be the sample case study available on GCP website, and the other one will be a new one (different people might get a different case). There will be 3 or 4 questions relating to each case study.
The Google documentation and best practices/FAQ pages on their particular products like BigQuery, Dataflow etc. seemed to be useful both for providing equivalence to the general Hadoop ecosystem and discussing the most cost-effective solutions (most cost-effective solution is a big theme thought the test).
Knowing the difference/use-situations between views/tables/datasets on BigQuery might come in handy since I feel there were >5 questions regarding whether which one was the best option for a given situation.
Quite a few questions on ‘access’ management, i.e. what to create if only a subset of people should see some info, how to show aggregates without exposing the underlying dataset etc. Probably useful to know different account types like service account and what flexibility they offer in providing group/individual access to data
Definitely know storage types for each situation, i.e. Transactional data goes in SQL/Datastore, BigTable/BigQuery for analytic data etc. Atleast 5 questions on the most appropriate storage type for different types of data.
- 2-hour, 50 multiple choice questions (select the best answer; or select applicable 2 or 3 answers)
- Scenario based questions, for example:
What’s the best product to use/best option to proceed if the company is expecting increasing data volume without increasing cost/budget?
Given the task/requirement, what is the best product/model/technique to use or what is the best solution?
What to review before the exam?
- Hadoop Ecosystem and on premise Apache open source system/server (e.g. Kafka, Cassandra, Jenkins, etc.)
- Google Cloud Products, especially BigQuery, BigTable, Dataproc, Dataflow, Streaming, BigTable Row index design
- Data science/engineering common sense
What will not be heavily tested in the exam or not very useful in preparing for the exam?
- Official Certification Exam Guide
Experience & Strategy
- schedule ahead, Some test center has open schedule over weekend.
- Read carefully and read fast.
- Use “Mark this question” function and come back to review the question and answer later.
- Be familiar with the Sample Case Study (“Flowlogistic Case Study”) before the exam, but hold off reading the story until you see related questions.
Cloud Dataproc Initialization Actions
BigQuery: Quickstart Using the Web UI
BigQuery: Quickstart Using the bq Command-Line Tool
BigQuery SQL Reference
BigQuery Client Libraries (Python, Java, etc)
TensorFlow Learn API
TensorFlow for absolulte beginners
Cloud Machine Learning Engine Documentation
Google Cloud Certified Data Engineer — Beta Exam Report
Google Cloud Platform Data Engineer Certification Review
How to Prepare for the Google Cloud Architect Certification Exam