DBT / DBT CERTIFICATION
Preparing for the dbt 'Analytics Engineering' Certification
Hints, tips and advice in preparing for the exam
This post aims to provide advice and insights for those working towards the dbt Analytics Engineering certification, the advice I wish I had!
Agenda
- Exam Overview
- Exam Candidate
- Study Approach and Resources Used
- Certification Review
Exam Overview
The exam was first introduced in June 2022, and the high-level exam details are as follows:
- Duration: 2 hours
- Number of questions: 65
- Passing score: 65%
- Price: $200 USD
Where the exam consists of questions based on the following topics:
- Developing dbt models
- Debugging data modelling errors
- Monitoring data pipelines
- Implementing dbt tests
- Deploying dbt jobs
- Creating and maintaining dbt documentation
- Promoting code through version control
- Establishing environments in a data warehouse for dbt
dbt have also provided the following certification study guide.
Note: The Exam Doesn't Exclusively Cover dbt Content
When preparing for the exam, remember that the certification is for dbt ‘Analytics Engineering' rather than a certification exclusively for dbt. As a result (and as indicated by the last two bullet points above), the exam doesn't solely cover dbt. Key examples of non-dbt topics include:
- Jinja templating
- Git workflows
- SQL, e.g., recommended use of CTEs
Exam Candidate
On dbt's certification webpage, it mentions how the recommended experience of an exam candidate are those with:
- 6+ months of building experience on dbt Core or Cloud
- and SQL proficiency
However, I recommend candidates have at least one year of hands-on experience. I've been using dbt for two years, but importantly have over a year's experience designing and deploying dbt in production. This is key to understanding implementation patterns and everyday use case considerations, e.g.:
- Options available on how to orchestrate and run dbt jobs in production
- What agreed dbt-Git workflow should you use as a team
- To implement CDC, what dbt snapshot strategy to use (
timestamp
vscheck
) - Understanding how to manage 'hard deletes' with dbt, e.g., being aware of the
invalidate_hard_deletes
option - Understanding the range of dbt tests (and dbt test packages) available and how to store the results using the store_failures option
Hands-on Experience Isn't Enough
The more significant point commonly mentioned amongst the dbt slack community is how more than hands-on experience is needed to pass the exam. Instead, candidates should have a firm understanding of supporting dbt reference documentation and other resources described below.
Study Approach and Resources Used
As a starting point, I naturally went through dbt's certification study guide as my first port of call.
Free Online dbt Training Courses
The study guide says to go through dbt's (free) online courses. I found these courses all useful, though I recommend the 'dbt fundamentals course' for onboarding developers new to dbt.
First Key Recommendation: Study the dbt Docs, Reference and Guides Documentation
I recommend anyone considering sitting the exam first study the official documentation from the dbt website in detail. A common theme on the dbt slack community #dbt-certification channel is how experience isn't enough and how developers have yet been exposed to much of the dbt functionality. I'd recommend cloning dbt's jaffle_shop project and going through dbt's documentation to replicate the features described.
The dbt website documentation I found particularly useful was:
- dbt Docs | getdbt.com
- dbt Reference | getdbt.com — I found this section especially useful. In particular, I used this to replicate dbt's recommended dbt projects structure and create working examples of features described in dbt's project checklist.
- dbt Guides | getdbt.com — one particularly useful callout is the 'legacy' section. This section deceptively covers two exam topics: 'debugging data modelling errors' and 'establishing environments in a data warehouse for dbt'. As a result, I'd strongly recommend going through the dbt Guides documentation.
Second Key Recommendation: Become Familiar with dbt Resource Property Configs & Jinja Functions
I recommend having a detailed understanding of the different config options available for differing dbt resources (documentation link). A good way of putting it — are you confident of the varying config options available for dbt sources, as shown below? Would you be able to write these from scratch?
dbt Jinja Functions (link)
An array of jinja functions is available in dbt to help make your code DRY. However, I recommend creating your own simple macro to understand what common dbt Jinja functions and variables are available.
dbt Blog Posts
Aside from the training courses, the prep guide lists links to dbt blog posts — where these themes came up in the exam! The blog posts I found particularly useful and pertinent to the exam are as follows:
Posts Relating to dbt Project Structure
- How we configure Snowflake | getdbt.com
- How we structure our dbt projects | getdbt.com — note: this was originally a blog post before dbt ported the content to their website. I found it helpful to build the target structure described to understand some decision-making and benefits.
- Your Essential dbt Project Checklist | getdbt.com — a bit more advanced, but one of the most beneficial, especially when you thoroughly go through it.
- Five principles that will keep your data warehouse organized | getdbt.com — some of the things mentioned closely relate to those mentioned in how we structure our dbt projects. Highlights the emphasis given to these topics.
Posts Relating to Git Workflows for dbt
- How to review an analytics PR | getdbt.com — quite a detailed post, but I found going through this in detail very useful.
- The Exact GitHub Pull Request Template We Use at dbt Labs | getdbt.com — actually the opposite of the previous post! In that, it's a lot quicker to go through.
dbt Community on Slack
As well as the above, I found the dbt slack community very handy. There is a slack channel dedicated to certification chat called #dbt-certification which is very useful for understanding common themes/questions from others.
Note: Knowledge of dbt Cloud isn't Required
One of the common questions asked on the dbt slack community is whether knowledge of dbt Cloud is required for the exam — i.e., will any questions come up relating to dbt Cloud? Looking at the exam curriculum, it's easy to see why — with section 5 talking about dbt jobs, this sounds like dbt jobs in dbt Cloud, right?
Well, the answer is no! Knowledge of dbt Cloud isn't required. According to dbt labs staff in the dbt slack community, there are no dbt Cloud-specific questions on the exam. And that "any questions related to jobs should be accessible to anyone who has defined a job, regardless of if it's in dbt Cloud or a third party orchestration tool."
Summary of Recommendations
In summary, I recommend exam candidates to:
- Have at least six months of experience, more like one year
- Use dbt's certification study guide
- Work through dbt's (free) online courses
- Study dbt's online documentation, particularly the docs, reference and guides sections — writing and executing code examples.
- Have a thorough understanding of dbt resource property options and config and dbt Jinja functions.
- Read the blog posts outlined in the study guide, particularly those listed above.
Certification Review
The objective of dbt labs in producing the certification is to educate and establish specific standards/patterns in how they would like people to use dbt — I think it has definitely achieved that. Having gone through the above study materials, I've regularly revised template scripts I use to follow some of the best practices and naming conventions described. In addition, I found that going through the breadth of documentation highlights lots of really beneficial but potentially not obvious functionality, e.g., the dbt test store_failures
option.
Anyways, I hope some of this information is of use to others. Reach out to me if you have any questions!