Becoming An Analytics Engineer in 2023: A Data Analyst Guide

Skills required to transition into Analytics Engineering

Balu Rama Chandra
11 min readNov 22, 2022

2023 is just around the corner and it’s that time of the year again where we reflect on what has happened this year, decide to make some changes and start to draft out our new year’s resolutions.

A bit of reflection
Before heading straight into the resolutions, let’s reflect a bit on 2022, especially on the tech job market. 2022 has been difficult for some but easy for others. We have read some terrible news lately as a lot of layoffs are happening in every part of the world, from large to low scale tech companies. I have seen a lot of good and talented individuals who are the backbone of their family unemployed and feeling distressed.

Moving forward
The economists and business leaders predict 2023 will even be tougher than 2022. The job market is going to be surplus, meaning there would be more job seekers than job providers. However, this statement should not discourage us to keep moving forward. It is survival time! The survival instinct is the first instinct that we have as human beings. We can anticipate and navigate through this storm by upgrading our skills or learning new skills to remain competitive in the job market. More skills, more opportunities!

Objectives
With that in mind, in this post I would like to share with my fellow Data Analysts (impacted with layoffs or not), about new skills and a role that they can transition into to give them an edge in this tumultuous time, and that role is Analytics Engineer.

This post will discuss about the skills and tools required for the transition and courses/materials to become a better Analytics Engineer. This post might be a good reference too for those who come from other backgrounds in data. Hopefully this role can be one of your new year’s resolutions.

What is Analytics Engineering?

Before we jump straight to the definition, let’s first understand what gives rise to Analytics Engineering.

The advent and rapid development of Modern Data Stack (MDS) tools such as Airbyte & Fivetran (ingestion) and cloud data warehouses (BigQuery, Redshift & Snowflake) have shifted companies’ paradigm from Extract, Transform and Load (ETL) to Extract, Load and Transform (ELT).

The shift to an ELT approach means data now lands in the data warehouse before it has been transformed.

This gives an opportunity for some technical analysts who both understand the business well and have the technical skills to model raw data into clean, standardized and well defined datasets. If you are looking for this specific kind of analyst while working with ETL paradigm, then they would need to be a hybrid between a data engineer and a data analyst, which is hard to find in these days. In this case, searching for a technical analyst is a lot more feasible. Enter Analytics Engineer.

The Analytics Engineers act as the bridge between data engineers and data analysts/business users. Some of their jobs are:

  • To build well tested, up to date and documented datasets that the rest of the company can use to answer their own questions.
  • Apply software engineering best practices in their data models like version control and CI/CD.
  • Act as facilitators between the business and technical teams and translate strategic business needs into data models and data visualizations.

With a lot of tech companies now adopting MDS, ELT paradigm and cloud data warehouse, I believe the Analytics Engineer role will be in demand in the foreseeable future.

What are the required skills?

SQL

Knowledge of SQL is a must for every data professional. Having a strong SQL fluency in aggregation, joins, case statements, CTE and window functions are a necessity to succeed as an Analytics Engineer. I trust that as a Data Analyst you are already familiar with SQL but if you are not, I have listed some resources below for you to learn SQL.

I believe nobody will get better at a skill if they don’t practice. So practice, practice and practice and while on that, learn better SQL practices.

Python

Knowledge of Python will open doors of opportunities in your data career. As an Analytics Engineer, a decent knowledge of Python in data types, data structures, if and for loop and how to create functions would be useful for your daily task, such as orchestrating your data workflow.

Data Modeling

Data modeling is the process of structuring your raw data into analytics ready format so that it allows the decision makers to make better data-informed decisions. Ideally, a data model should be built around a business process. Some of the most popular data modeling techniques are Star Schema (Kimball’s), One Big Table (OBT) and Data Vault. I recommend to start with Kimball’s first and explore the rest as you go. Some of the resources to learn about data modeling:

Cloud Data Warehouse

As I have mentioned before, one of the drivers behind the rise of Analytics Engineer role is the rapid development of cloud data warehouses, hence a knowledge in cloud data warehouses is a must for an Analytics Engineer. There are some cloud data warehouses available in the market right now such as Google BigQuery, Amazon Redshift and Snowflake. If you have not had any experience with any of them, pick one and start learning.

Version Control

Version control allows you to track different versions of your codes and collaborate with other developers. You can keep track of what changes are made, by whom and when those changes are made with Git, a version control system. There are some Git hosting services such as GitHub, Gitlab and Bitbucket. I would recommend you to start with GitHub. Some of resources where you can learn about Git & GitHub:

For more advanced tutorials, you can checkout freecodecamp’s YouTube videos on learning Git with GitHub.

Data Transformation

Data transformation is the process of structuring and reformatting your raw data into data models from which we can learn insights and derive business intelligence. This process often involves integrating transactional data (sales, cost, etc) with operational data (name, place, etc) so that the data can be made useful for business analytics.

dbt (Data Build Tool) is a data transformation tool that allows us to do all the above. It transforms, cleans, aggregates and tests data within our data warehouse. The transformation logic is written in plain SQL statements without the need to define the DDL structure upfront. Here are some resources to learn dbt:

To take your skills to the next level with dbt:

  • Learn macros and jinja to ensure the code you write is ‘DRY’ (don’t repeat yourself) since they can be reused across multiple dbt models.
  • Learn and work on projects with dbt Core. While dbt Cloud provides you with the fastest, most reliable and a nice UI to develop your data models but they are only free for 1 developer. Thus, understanding how to use dbt Core (free & open-source) would be beneficial for your career as you can provide some cost savings alternatives to your prospective employers.

Data Quality Testing

One of the tasks of Analytics Engineer is to catch any data quality failures before downstream data consumers do. Analytics Engineer needs to set up a proper data quality test and ensure that the data has been tested thoroughly before presenting them to the business users. Some basic verifications on data that we do at this stage are data types, unique id, not null, accepted values in a column, etc.

dbt provides built-in data quality testing for you and if you feel it’s not enough they have some excellent packages for data quality testing, such as dbt-expectations and re_data. You can even write your own data quality test with macros for more complex testing. Have a look at this article on data quality testing.

Data Documentation & Lineage

At its simplest form, data documentation is information about your data that ranges from raw schema information to user-supplied information. Data documentation should be part of your data modeling process. dbt has extensive documentation features, including an auto-generated documentation option and the ability to view documentation right next to your code. It has features that allow us to share metrics and data sets from our data warehouse and dbt projects to business users. You can learn more about dbt data documentation here.

Data lineage helps technical and business users to understand how the data flows from data sources to consumption by visualizing them with directed acyclic graphs (DAGs). From a technical user perspective, the lineage helps them to determine the root cause of an error in the whole data flow. From a business user perspective, explaining projects — and the value they’ll deliver — is a lot easier too. A graph is a lot more intuitive compared to lines of code, and it’s much easier for non-technical people to understand what’s happening to the data. You can learn more about dbt data lineage here.

Data Orchestration

What is data orchestration and why do you need to learn one? Data orchestration is the process of gathering the data together from disparate sources and preparing them for data analysis. As an Analytics Engineer you are tasked with automating these tasks. Most of the time, you will use a data orchestration tool to automate, schedule and monitor the creation of your dbt models in staging and production environment.

There are some data orchestration tools in the market and the most popular one is Apache Airflow. To learn more about airflow + dbt, you can visit these guides:

There are other data orchestration tools which are worth looking into, such as Prefect and Dagster. But I recommend learning Airflow first as most companies still prefer Airflow over the others.

CI/CD

CI, stands for Continuous Integration, is a software development practice in which all developers merge code changes in a central repository. Each code change will trigger an automated build-and-test steps to ensure the change is merged reliably to the repository. The code is then delivered quickly and seamlessly to production as a part of the Continuous Delivery (CD) process. An Analytics Engineer uses CI/CD to deploy their dbt models to production.

Some of you might ask: can’t we use Airflow for that? Yes we can, If you already have Airflow running in production. But if you are not, CI/CD (GitHub Actions & GitLab CI) could come to the rescue. To learn more about CI/CD with dbt, you can visit these resources:

Data Visualization

In some cases, an Analytics Engineer is responsible for developing dashboards with BI tools such as Tableau, Looker or PowerBI. I believe most data analysts nowadays are already familiar with these BI tools hence the transition in this part would not be a big issue. Familiarity with some open-source BI tools such as Metabase and Apache Superset would be an advantage especially if you are going to work with early stage startups.

Here, I would like to recommend a BI tool that has caught my attention lately, which is Lightdash. Lightdash is an open source alternative to Looker, built using dbt. As your data models are developed with dbt, I believe having a BI tool that is specifically built for dbt will help you to quickly build your dashboards and push them into production. Here is the guide on how to get started with Lightdash.

Communication Skills

As I have mentioned above, an Analytics Engineer is someone who acts as a bridge between data engineers and business users, hence communication is another important skill an Analytics Engineer should have. For example, when the data quality is compromised or when data pipeline is damaged they need to communicate this to the relevant individuals because if this is not done, erroneous assumptions can be made on defective data. In this context, I believe the data analysts who wish to transition into analytics engineering already possess this skill as they have continuously worked with the business users before.

An Analytics Engineer requires the same level of communication skills as a data analyst role would require, alongside the technical chops of a data engineer. Becoming a better communicator is a skill, not a talent. Have a look at this free course on datacamp on Data Communication Concepts.

Some other tools and resources to learn to take your skills to the next level:

  1. Data ingestion tools — I recommend Airbyte open-source as it is free.
  2. Terraform, a tool that you can use to automate various infrastructure tasks.
  3. Docker, a tool that simplifies the process of creating, running, managing and distributing applications by using containers.
  4. SQLFluff, a SQL linter, auto-formatter and it also works with jinja.
  5. Some basic linux commands knowledge, especially on file management.
  6. Data Engineering Zoomcamp course by DataTalks Club. Highly recommended! This course will teach you fundamentals of data engineering and week 4 of the course is dedicated fully to Analytics Engineering.

I know this has been a long read, but I hope this is worth every second of your time especially if you are looking for a career transition into Analytics Engineering. I hope this post is useful for you and if you have any feedback please feel free to write in the comment section. Thank you.

--

--