Are You an Accidental Data Engineer?

Ryan Peden
The Prefect Blog
Published in
6 min readJan 23, 2023
A blue duck trying to determine if it’s a data engineer

Data engineering is a crucial role that involves designing, building, maintaining, and troubleshooting the systems and infrastructure that help organizations store, process, and analyze large volumes of data. It includes tasks such as extracting, transforming, and cleaning data, building data pipelines to move data from one place to another, and setting up and maintaining data pipelines and data storage systems.

While data engineering is a distinct discipline, it heavily relies on software engineering skills. For example, data engineers often use programming languages like Python and SQL to build data pipelines and data processing systems. In addition, they need to design and architect these systems to be scalable, efficient, and reliable.

As a result, software engineers may find that they’re already doing data engineering work, even if they don’t identify as data engineers. If you’re one of them, you might be an accidental data engineer: a software engineer who spends much of their time on data engineering tasks without intending to.

This post will help you determine if you’re an accidental data engineer and highlight ways to engage more deliberately with data engineering. In doing so, you can become a better software engineer and open up attractive new career options.

Examples of Data Engineering Tasks

As a software engineer, there’s a good chance you’ve done tasks like extracting data from files or APIs, implementing processing logic that modifies the data in some way, and then writing it to a database. If so, you’ve already done data engineering!

Data engineering has evolved into a discipline separate from data science and data analysis. Data engineers often work closely with data scientists and analysts to ensure the data they need is available, accurate, and in the appropriate format.

Some examples of data engineering tasks that you may already be doing include:

Extracting data from various sources. Extraction may involve querying databases, calling APIs, or parsing flat files to extract data.

Transforming and cleaning data. Data engineers often need to transform and clean data to make it suitable for analysis or storage. Transformation tasks include deduplication, data type conversion, and handling missing or invalid values.

Loading data into a destination system. Sometimes, you can use transformed data immediately. But usually, you’ll want to load the transformed data into a database or data warehouse for future use. Loading may involve creating tables in the destination system, optimizing insert and update performance, as well as handling errors and failures.

Building data pipelines. Data pipelines are responsible for moving data from one place to another. They’re usually where the extraction, transformation, and loading tasks we’ve already described occur. Building data pipelines may involve designing the data flow, implementing data processing logic, and scheduling and monitoring pipeline runs.

Setting up and maintaining data infrastructure. Data engineers are responsible for setting up and maintaining the systems and infrastructure that store and process data.

This may involve setting up and configuring databases, data warehouses, or data lakes or building custom data storage solutions.

If you regularly perform these tasks, you’re already doing data engineering work, even if you don’t (yet) identify as a data engineer. Realizing that you’re doing data engineering can have many benefits, regardless of whether you want to move into data engineering or remain a software engineer.

Why Learn More About Data Engineering as a Software Engineer?

If you already enjoy your work, you might wonder why you should learn more about data engineering.

Data engineering knowledge can make you a better software engineer in several ways, including:

Better understanding of specific tools and practices. Data engineering involves a set of tools and practices that can help you work with data more effectively. For example:

Airbyte and Fivetran are popular data integration tools that can help you extract, transform, and load data from many popular databases and other data sources without needing to write your own data ingestion code. This means less code to write and maintain, and more time to focus on delivering business value.

dbt is a popular tool for building, testing, and deploying data transformations without needing to create tables and build merge logic.

Prefect is a popular tool that provides built-in retries, automatic scheduling, and observability for data pipelines. It’s like air traffic control for your data pipelines and workflows, so you don’t have to build homegrown schedulers or get frustrated when your data pipeline fails and you don’t know why.

Debezium helps you capture changes made to a database so you can process changes made to your data and react to them as they happen. and apply them to other databases and data products.

Improved ability to communicate with data engineers and other stakeholders. Understanding data engineering can also help you communicate more effectively with many stakeholders involved in data projects and understand their goals. You may be familiar with the high-level goals of a data project, but knowing the specific tasks and challenges involved in data engineering can help you better understand the project as a whole and provide more informed input.

Greater appreciation for data engineering work. Recognizing and understanding data engineering work can also help you appreciate the value and complexity of data. In addition, it may help you see data projects in a different light and give you a deeper understanding of the challenges and impact of data engineering.

Understanding data engineering can ultimately make you a better software engineer, as you can tackle data-related challenges with ease. Your skills as a software engineer also put you in an ideal position to pursue data engineering roles.

Benefits of Being a Data Engineer and A Software Engineer

Don’t feel like you need to choose between being a data engineer or a software engineer. Understanding both offers exciting ways to combine your software engineering skills with data engineering expertise.

As a software engineer, you already have a strong foundation in programming and problem-solving — skills that can make you highly sought after as a data engineer. Your software engineering background gives you a head start in several ways:

You know how to design and implement large-scale systems. Data engineering often involves building and maintaining complex data systems, and having a software engineering background can give you the skills and knowledge to build data systems that are scalable, reliable, and maintainable.

You can take on complex data projects that involve both data engineering and software engineering. Data engineering projects often involve integrating data systems with other software systems, and data engineers with software engineering backgrounds are well-suited to tackle these projects.

Your systems integration experience offers you a unique opportunity to apply your software engineering skills to data projects, building systems that can handle complex data flows and integrate seamlessly with other apps and APIs.

You know how to build products. As a data engineer with a software engineering background, you’ll likely have the skills and knowledge to build innovative data products that combine data and software.

Even better, you can apply full-stack software engineering skills to data projects, leveraging your expertise in building user-friendly and scalable software to create data products that are similarly user-friendly and scalable.

You understand data-related business needs. If you’ve worked on an Agile software team, you have already learned about meeting business needs via user stories and acceptance tests.

You can use this experience to understand the needs of business stakeholders and translate them into data solutions that meet business goals and objectives.

Software engineers with data engineering skills are in high demand and have more career opportunities. You’ll probably command a higher salary and will certainly have more flexibility in your career path.

There is, of course, risk in trying to take on too much at once. Fortunately, there’s a significant overlap between data engineering and software engineering. Moreover, the skills you need to be successful in one are useful in both. So now is a perfect time to break into data engineering if you have a software engineering background.

And if you learn all about data engineering and decide to stay focused on software engineering, that’s fine too. Many companies are eager to hire software engineers with a deep knowledge of databases, data engineering, and data products.

Next Steps

As a software engineer, you already have a strong programming and problem-solving foundation.

Understanding data engineering and recognizing the data engineering work you’re already doing can help you become a more efficient and effective software engineer. It can also open up opportunities for you to specialize in data engineering or to combine your software engineering skills with data engineering expertise to work on more complex data projects.

Either way, you can benefit from the knowledge of both fields and the unique opportunities that come with combining them.

If you want to start exploring the world of data engineering, why not try Prefect? First, dive into the getting started guide to pick up the basics. Then, explore Prefect’s integrations with dbt, Fivetran, and Airbyte to learn more about the data engineering ecosystem.

--

--