Getting your SnowPro Advanced Data Engineer Certification

SnowPro Advanced Data Engineer certification is now available. This new and exciting advanced certification is designed to assess advanced knowledge and skills used to apply comprehensive data engineering principles in Snowflake. I recently achieved this certification and I’d like to share some information to help you prepare for the certification exam.

SnowPro Advanced Data Engineer exam is the second exam available in the Snowflake Advanced Certification series. This exam targets data engineers who have 2+ years of data engineering experience, including hands-on experience using Snowflake, and tests the ability to

  • Source data from Data Lakes, APIs, and on-premise
  • Transform, replicate, and share data across cloud platforms
  • Design end-to-end near-real-time streams
  • Design scalable compute solutions for DE workloads
  • Evaluate performance metrics

The exam has 65 multiple-choice/multiple-select questions and you have 120 minutes. You can find the registration information for the exam here.

Below are the main topics covered in his exam:

DATA MOVEMENT (35–40%)

1. Given a data set, load data into Snowflake.

  • Outline considerations for data loading
  • Define data loading features and potential impact

2. Ingest data of various formats through the mechanics of Snowflake.

  • Required data formats
  • Outline Stages

3. Troubleshoot data ingestion.

4. Design, build and troubleshoot continuous data pipelines.

  • Design a data pipeline that forces uniqueness but is not unique.
  • Stages
  • Tasks
  • Streams
  • Snowpipe
  • Auto ingest vs. Rest API

5. Analyze and differentiate types of data pipelines.

6. Install, configure, and use connectors to connect to Snowflake.

7. Design and build data sharing solutions.

  • Implement a data share
  • Create a secure view
  • Implement row level filtering

8. Outline when to use an External Table and define how they work.

  • Partitioning external tables
  • Materialized views
  • Partitioned data unloading

PERFORMANCE OPTIMIZATION (20–25%)

1. Troubleshoot underperforming queries.

  • Identify underperforming queries
  • Outline telemetry around the operation
  • Increase efficiency
  • Identify the root cause

2. Given a scenario, configure a solution for the best performance.

  • Scale out vs. scale in
  • Cluster vs. increase warehouse size
  • Query complexity
  • Micro partitions and the impact of clustering
  • Materialized views
  • Search optimization

3. Outline and use caching features.

4. Monitor continuous data pipelines.

  • Snowpipe
  • Stages
  • Tasks
  • Streams

STORAGE & DATA PROTECTION (10–15%)

1. Implement data recovery features in Snowflake.

  • Time Travel
  • Fail-safe

2. Outline the impact of Streams on Time Travel.

3. Use System Functions to analyze Micro-partitions.

  • Clustering depth
  • Cluster keys

4. Use Time Travel and Cloning to create new development environments.

  • Backup databases
  • Test changes before deployment
  • Rollback

SECURITY (10–15%)

1. Outline Snowflake security principles.

  • Authentication methods (Single Sign On, Key Authentication, Username/Password, MFA)
  • Role Based Access Control (RBAC)

2. Outline the system defined roles and when they should be applied.

  • The purpose of each of the system defined roles including best practices used in each case.
  • The primary differences between SECURITYADMIN and USERADMIN roles
  • The difference between the purpose and usage of the USERADMIN/SECURITYADMIN roles and SYSADMIN.

3. Outline Column Level Security.

  • Explain the options available to support column level security including dynamic data masking and external tokenization
  • DDL required to manage dynamic data masking
  • Methods and best practices for creating and applying masking policies on data

DATA TRANSFORMATION (15–20 %)

1. Define User-Defined Functions (UDFs) and outline how to use them.

  • Secure UDFs
  • SQL UDFs
  • JavaScript UDFs
  • Returning table value vs. scalar value

2. Define and create External Functions.

  • Secure external functions

3. Design, Build, and Leverage Stored Procedures.

  • Transaction management

4. Handle and transform semi-structured data.

  • Traverse and transform semi-structured data to structured data
  • Transform structured to semi-structured data

5. Outline different data schemas.

  • Star
  • Data lake
  • Data vault
Photo by Green Chameleon on Unsplash

Snowflake offers an excellent instructor-led training course to prepare for this exam, but you can certainly prepare for this exam yourself as well.

The exam questions are truly designed to assess real-life, hands-on experience with Snowflake. Based on my personal experience taking this exam, below are a couple of tips and tricks to get you started:

  1. Review the Self study guide posted here. Go through each link. The Snowflake documentation links are particularly helpful for the concepts you will need for this exam.
  2. As expected, ingesting and transforming data natively in Snowflake is definitely a big topic in this exam. Make sure you know COPY INTO syntax as well as best practices, external tables, streams and tasks, Snowpipe, Kafka connector, stored procedures, internal and external stages really well. Expect real-life scenarios you would be facing when building data pipelines with these. These questions are the best!
  3. Yes, some questions (~30% in my own exam experience) test syntax knowledge (mainly SQL), but if you have 2+ years of hands-on experience with Snowflake, these will not be super hard for you.
  4. The exam questions are definitely tailored toward data engineers when it comes to advanced concepts like Performance Optimization and Security. These topics were also covered in the Advanced Architect exam, but the questions in the Architect exam assessed deeper knowledge and definitely much harder. For Security-related questions, I would definitely understand the role hierarchy in Snowflake and the role-based access control concepts that you would need to know as a data engineer. For performance optimization questions, make sure you understand how to interpret a given query profile and the metrics and statistics displayed.
  5. I have taken a lot of other certification exams and I usually have enough time to go through each question twice. In this exam, I didn’t have the time to do that. I completed all 65 questions in ~115 minutes or so. Even though you have ~2 mins for each question, some questions will definitely require more time than 2 mins. On the other hand, some questions are pretty short and you will need only a few seconds to come up with the right answer. Use your time cautiously and don’t ponder on one question for more than 3–4 minutes.

I personally enjoyed taking this certification exam and I felt like it was a great learning experience overall.

Good luck!

I am currently a Snowflake Pre-Sales Architect at Snowflake. Opinions expressed in this post are solely my own and do not represent the views or opinions of my employer or Snowflake training department.

--

--

Eda Johnson
Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

NVIDIA | AWS Machine Learning Specialty | Azure | Databricks | GCP | Snowflake Advanced Architect | Terraform certified Principal Product Architect