Preparing for Snowflake Architect Certification

Published in

Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

9 min readApr 6, 2021

How to Prepare for Snowflake’s Architecture Certification Exam

I have gotten a lot of requests on LinkedIn from many aspiring professionals to provide some guidance on the advanced certification. I have referred them to great snowflake documentation. However understandably documentation is quite exhaustive, and it is humanly impossible for anyone to master all the contents in the given time. I have gone through the same issue. Many people expressed the same in my 1–1 chats. My idea is to put together a quick cheat sheet of some areas to focus on for the exam. Do not consider this as the only means for preparing or passing the exam. Use this based on your discretion. I will not be going into any specific technical details of those. I will only be covering the pointers that will help you to drill down more based on the area/pointers

Overview and Architecture [25–30%]

Focus on Snowflake’s differentiating features, Understand various “Caching techniques”, the concept of “Micro Partitions”, “ Data Cloud”, etc
Think of scenarios like “ Sales Department ran a query using warehouse1 at 9 AM, Marketing department ran the same query at 9.05 AM using Warehouse2”. Will Marketing department Query leverages cache? If so what cache it uses? Note: Warehouses are set for one-hour auto suspend
Understand “Cloud services” in detail and how does costing for “cloud services” works?
An example scenario could be Customer complains that he has been charged for cloud services layer credits. What could be the possible reason for this?
Understand snowflake’s metadata (ex: Clustering, stages, etc.) and differences between Account Usage and Information usage.
An example Scenario could be Customer wants to store the Query History of the last two years. What is the best way he can accomplish this?
Understand what type of queries consume credits vs what type will not? How do you recognize them? (ex: Will count (Column) consumes credits or not?
Understand TT. What happens when you increase duration vs reduce change data type etc. understand the impact of fail-safe
Understand the Account Hierarchy. Are Sequences fall under account?
How do I create a table with the mixed case? Can you create one?
Warehouses: Read Every option given on warehouse creation carefully. The exam will expect you to answer the scenario-based questions. Ex: What can you do to reduce queuing? What would happen if you have a long-running query which is more than auto suspend time etc.
Data Types, Identifiers, Timestamp options, etc.
When do you use Show vs Lists?
How do I store the result of Show Users?

Performance and Tuning [15–20%]

Understand the different techniques to improve performance. (ex: Materialized view, Automatic Clustering, Search Optimization, etc)
Understand the difference between Natural Clustering and Automatic Clustering. What Actions trigger clustering? What are the costs associated with Clustering (ex: Storage?/Compute?).
Understand the use cases for Clustering and Materialized views etc?
Understand the Query Profiles ( how to read? what is wrong with it? How could you fix it etc)
Understand Common problems related to pruning, Join explosion, etc.
How can you query clustering depth and clustering overlap?
Storage costs vs compute costs
How do you / can you improve the performance of internal, external stages, semi-structured data, etc?
Understand and get familiar with questions like .A Customer warehouse has a large table. There are two sets of users with different access patterns. One set of users filter the query based on column1, Column2, and Another set of users filter based on different columns. What is the best way to handle this?
A Customer Complains that there is a lot of Queuing on the queries run against a table that is utilizing the Warehouse3. He has increased the warehouse-size with no improvement? What could he have done differently?
A customer has a concern that Querying the data against the data in S3 is taking suddenly more time. Based on the Initial analysis you have discovered that data volume on the Files has increased? What can you do to improve the performance without creating another physical copy of the table?

Data Movement: [20–25%]

Practice the syntax of different data loading techniques (ex: Copy Into, Streams, Tasks, snow-pipe Unload) and what privileges are required, etc
I would read all the optional parameters of Loading /Unloading carefully. Ask the questions yourself in terms of when do I use what stage, Can I transform the data while loading, What operations are permitted what operations are not, etc
Think of practical scenarios you encounter while loading the data? How do you do error processing? what options you have? Try loading some data and unload some data. This is a very important topic.
File Formats: When and How do I use them? Where Can I define these? What is the order of precedence?
Stages: What are the best practices around maintaining the stage data? Do I persist with this data? If not how and when I purge this data?
Understand the differences between Data load and Data Unload etc. [ex: Can I unload the data to a single file? or to variant file etc]
Understand the Best practices of data Loading ( Loading the large file vs smaller files, what type of warehouse is suited when etc)
Understand the serverless options like snow-pipe ( auto ingest and Rest API). What are the various APIs to load the data and its limitations/capabilities etc?
Understand the Differences in authentication methods of Regular Copy vs Using Snow-pipe
How do you do error processing in case of snow-pipe?
Understand how streams and Tasks can be used together? Understand when the warehouse credits get triggered when you have created a task on the stream ( Only when the stream has data or when the task is running etc)
Stream Metadata and how do you recognize true updates etc.
What happens when you clone any of the above objects?
Few scenario-based questions
What option one has to choose if he wants to return all errors across all files specified in the COPY statement, including files with errors that were partially loaded during an earlier load?
A customer has existing clickstream data which automatically loads to external cloud storage once every 30 minutes. They would like to automatically load certain Important files into a table in snowflake. They would like to leverage serverless features of the snowflake? What is the best way they can do this?
A data pipeline has several steps which are reading the data, transforming and converting the data into the desired output. At the last step, data is being unloaded to multiple files of a total size of 4 GB. Customers upstream process expects to receive one file as it can not process multiple files. They are not worried about the time it takes for an end-to-end process? What options they can choose to accomplish this? Is it possible to accomplish this?

Storage and Protection: [5–10%]

The customer has a very big data pipeline that runs once a week and updates 100’s tables. In one of the weeks, data load has failed and data was partially loaded to some tables and did not load some other tables. They have decided to rollback the load which has failed? How could they accomplish this?
The support team is managing a data lake built on the snowflake and they have 100 databases and a total size of 500 TB. One of the project team wanted a copy of all the databases worth 500TB to perform some testing. The project manager wanted to know the additional storage cost for this new environment? How will you determine the same?
Very Sensitive Customer wants to ensure that data is accessible to customers 24/7/365 and wants to build a DR scenario in another cloud provider. Is it possible to do? What feature of snowflake do you use to do the same?
Customer’s data replication jobs are failing? What could be the possible reasons for the same? (Not talking about networking reasons)
The support team has received a request to move 5 schemas out of 10 schemas into another Region using Data Replication? The team lead has reached out to you to assist you. How do you explain the snowflake replication features to him/her?
The customer wants to create a copy of the table structure. What is the best way you can do this? ( Understand the differences between Like, CTAS and Clone)
DBA Has cloned the entire database, streams. However, end-user is not able to query the unconsumed data in the original streams? Is this an expected Scenario?
Understand what objects can be cloned vs what objects can not be cloned? What happens when you clone? What happens to privileges of objects when you clone etc?
What options do you choose ( can you?) to simulate the security of the primary database in case of replication?
Semi-Structured data:
Get Familiar with How do you query the semi-structured data? how to use Flatten Function?
Understand the data conversions: Treatment of NULLS, Variant to JSON, JSON to variant, Array Handling, etc, Case sensitivity, etc
How do you improve performance on Semi-Structured data? Is MV or Clustering an Option?

Data Sharing: [8–10%]

This is an important and groundbreaking feature of traditional technologies. I would think that percentage of questions on this topic will increase in future
Understand privileges required to create the share, import the share, create a database, and also how and who can revoke the privileges
Shared databases are read-only and will not contain any time travel and can not share outside of the region and cloud provider you are in.

Security: [5–10%]

Understand the Security Model completely. The snowflakes security model is innovative and different from the traditional technologies.
Understand the RBAC structure fully. Understand how the role hierarchy works. What is the best way to design access roles and functional roles? How do I/Can I Provide row level security? Column level security? How can I enable HIPAA controls for sensitive data?
Expect the exam to provide you a scenario and asks you to present the best option. Focussing on the above will help you to answer these
In Practical life, you may have to give permissions to Mr x to create a Task, Share or Monitor Privileges, Focus on what access is needed to enable the corresponding user/role. [ex: Can I create Multiple Roles in the reader account?]
Snowflake permissions are role-based. Not the user-based. Understand the difference.
Think of the scenarios like while sharing the data to a consumer what kind of security controls need to be applied to views and impact of the same to performance etc.

Account Parameters and Settings:

Understand the important parameters that can be set at the account level, database level, warehouse level in our day-to-day life? [ex: Think of scenarios like what parameter you set to check the performance of a query etc. how many queries can run concurrently etc, what are the default time out parameters etc]
If I have the same parameter set at the account level and session-level which one takes precedence?

Connectors: [3–5%]

Understand the various ways you can connect to snowflake programmatically.
Focus on Kafka Connector. Understand the end-to-end flow (i.e data producer to Final table) what are different objects gets created etc.
Understand how the billing works, what kind of metadata gets created in the existing and new tables etc.
Understand how the billing works.

Bonus Tips🙂

Read Usage notes of all the Topics
Re-Review the Privileges required to create, view, monitor all the objects
Understand the use cases for different ways of performance tuning.
Pay attention to all the aspects of study guide.
Understand how the billing works for storage, compute, and especially for serverless features and how do you recognize them (ex: What kind of warehouse, etc)
Check the compatibility of your laptop. Some Mac users have a difficulty for installing the required software.

FAQs

What is the passing score? 70%
Can we refer to documentation? No
Will the exam provide feedback? You will know the passing score and whether it is a Pass/Fail
Can we read out the questions during the exam?: You cannot read the questions as the proctor assumes that you are recording the questions or seeking help from someone and pauses your exam.
Can we keep white papers? No pens, Books, white papers, cell phone, or any other material is not allowed
Cooling Period? In case of a failure, you need to wait for 15 days. Refer to snowflake documentation for exact details
Pre-Qualification:
a) Completion of SnowPro core Certification b) at least two years of working experience. Snowflake do not validate the years of experience. However, idea is that within two years you will cover many practical aspects of different areas of the snowflake as the exam primarily focuses on scenario-based questions and snowflake’s various capabilities.

Note: I’m currently a Senior resident solutions architect at Snowflake. Opinions expressed are solely my own and do not represent the views or opinions of my employer or training department.

How to Prepare for Snowflake’s Architecture Certification Exam

Written by yadukishore tatavarthi