A 6-month plan to becoming a well established Azure Data Engineer

Thomson Dcruz
4 min readApr 14, 2023

--

Data engineering is the practice of designing and building systems for collecting, storing, and analyzing data at scale. It is a broad field with applications in just about every industry. Data engineers are the backbone of any data driven organization, as they are responsible for creating and maintaining the data pipelines and databases that enable downstream applications, data scientists and business users to access and use data.

Becoming a data engineer can be rewarding and fulfilling for several reasons. First, data engineers are in high demand, as businesses need them to help them leverage their data for better decision making and performance optimization. Second, data engineers are well-paid, as they have specialized skills and expertise in data and technology. Third, data engineers have opportunities for growth and advancement, as they can take on more challenging projects and responsibilities as they gain experience and develop new skills. Fourth, data engineers have job satisfaction, as they can work on interesting problems and make a tangible difference in a world where data is abundant and valuable.

Azure is a leading cloud provider that offers a range of data engineering solutions and tools on its cloud platform. Many businesses are moving their data to Azure to take advantage of its data & analytics capabilities and improve their decision making and performance. Today, there is a high demand for Azure Data Engineers who can design and implement data solutions using Azure cloud services and have the skills to work with data related next generation technologies on Azure, such as Python, SQL, Spark, Data Factory, Synapse Analytics, etc.

For those aspiring to be champion Azure data engineers, here the detailed 6-month plan & roadmap that I myself followed. I can guarantee you that following this roadmap is a path to assured success.

Month 1: Learn the fundamentals of data engineering and Azure. You should be familiar with the concepts and technologies involved in data engineering, such as data storage, data processing, data pipelines, data warehouses, and data lakes. You should also learn the basics of Azure services and frameworks that are relevant for data engineering, such as Azure Data Factory, Azure Synapse Analytics, Azure Stream Analytics, Azure Event Hubs, Azure Data Lake Storage, and Azure Databricks. You can use online courses or tutorials to learn these topics, such as:

Get started with data engineering on Azure
Data Engineering on Microsoft Azure

Month 2–3: Learn Python and Spark. You should be proficient in Python, which is one of the most popular programming languages for data engineering. You should also learn how to use Spark, which is a distributed computing framework for processing large-scale data. You should be able to write Python code that can run on Spark clusters using PySpark. You should also learn how to use Databricks, which is a cloud-based platform that integrates with Azure and provides a unified environment for working with Spark and Python. You can use online courses or tutorials to learn these topics, such as:

Python for Data Engineering (assuming you know the basics of python. If not, you’ll have to spend additional time mastering python using the Python Bootcamp course)
Spark — The Definitive Guide (Thoroughly read and understand chapters from 1–11 and 14–19)
Best Hands-on Big Data Practices with PySpark & Spark Tuning

Month 4: Learn ADF and Databricks. You should be able to use Azure Data Factory (ADF) and Azure Databricks to create and manage data pipelines and data warehouses. ADF is a cloud-based service that allows you to orchestrate and automate data movement and transformation across various sources and destinations. You should be able to use ADF and Databricks to ingest, transform, load, and analyze data from various sources such as relational databases, files, streams, etc. Additionally, when dealing with big data, file formats play a crucial role in determining the performance, efficiency, and cost of data processing and analytics. Different file formats have different characteristics and trade-offs that affect how data can be read, written, compressed, split, parsed, and queried. Understanding file formats and choosing the most appropriate format for your data pipelines and workflows plays a big role in the overall performance of your jobs. You can use online courses or tutorials to learn these topics, such as:

Create a Data Pipeline using Azure Data Factory
Databricks Lakehouse Overview Training
Making Apache Spark™ Better with Delta Lake
Understanding File Formats and Performance Optimization Opportunities

Month 5: Learn DevOps and security. You should be able to apply DevOps principles and practices to your data engineering projects. DevOps is a set of practices that aims to improve collaboration and efficiency between development and operations teams. You should be able to use tools such as Git, GitHub, Azure DevOps, etc. to manage your code, version control, testing, deployment, monitoring, etc. You should also be able to secure your data pipelines and data stores by following best practices for data governance and security policies. You should be able to use tools such as Azure Key Vault, Azure Active Directory, etc. to encrypt, authenticate, authorize, audit, etc. your data assets. You can use online courses or tutorials to learn these topics, such as:

Azure DevOps Fundamentals for Beginners

Month 6: Prepare for certification exams. You should be ready to clear two major exams to become a robust Azure Data Engineer. These exams are DP-203: Microsoft Certified: Azure Data Engineer Associate and Databricks Certified Associate Developer for Apache Spark. These exams test your knowledge and skills in designing and implementing data solutions using various Azure services and frameworks. You should review the exam objectives and skills measured for each exam and take practice tests to assess your readiness. You can use online courses or tutorials to prepare for these exams, such as:

Databricks Certified Associate Developer for Apache Spark
Microsoft Certified: Azure Data Engineer Associate
Databricks Certified Developer for Spark 3.0 Practice Exams

All the very best!

Follow me on LinkedIn at — Thomson D’Cruz

--

--

Thomson Dcruz

Senior Consultant & Azure Solution Architect | Data & AI Expert