Data Quality Engineering: What’s the Difference?

Published in

Kingfisher-Technology

5 min readJust now

As a Principal Quality Engineer, I’ve spent much of my career in software development, where the role is well established. The focus is ensuring the code works as intended, detecting bugs early and delivering a seamless user experience through rigorous testing. When I transitioned into data engineering, my role responsibilities expanded to leading the integration of quality engineers into existing data engineering squads. This involved recruiting new colleagues, mentoring the team, defining our test strategy within data engineering, and promoting a quality-first mindset within the squads. It didn’t take long to realise that quality engineering in data comes with unique challenges that differ from quality engineering in software development.

Building the Team and Understanding the Landscape

My first step was recruiting new colleagues to join our data engineer squads. I was fortunate to find talented colleagues with strong data experience and, more importantly, a tremendous supportive mindset. I connected with our data engineers, data designers, and other stakeholder colleagues to understand their pain points and where they felt the quality could be improved. I also wanted to get a view of our existing ways of working. Once I had a good grasp of the current practices, I needed to outline the role of quality engineers within data engineering and start to build a team.

The Unique Challenge of Data Quality Engineering

Image of Data engineers and a Software Engineer — typical view of a Data Quality Engineering and Quality Engineering

Our organisation has been growing, as has our data, with data flowing in from various sources. The first challenge for the quality engineers joining our squads was understanding the differences and complexity of the data. Quality engineers working in software development can usually assess the product and quickly understand its intended functionality with relative ease. They ensure the code behaves as intended and performs well under various conditions, allowing for much of the testing to be automated to support the fast pace of development.

In data, the ‘product’ is the data, and unlike code, which is more structured, data can be messy and unpredictable! It doesn’t always follow any expected patterns and comes from different sources; each can have its own quirks! For quality engineers in data engineering, the emphasis is on validating the data itself to ensure data accuracy, consistency and integrity. Different tools and techniques are used to identify any anomalies, which is essential because bad data can lead to poor business decisions.

While both roles aim to ensure quality, their focus is on different aspects, with one concentrating on software functionality and performance. Whilst in contrast, the focus for data quality engineers is on data integrity and ensuring that the data is fit for purpose, which is even more complex when dealing with massive datasets which can come in various formats (structured, unstructured, semi-structured) along with varying degrees of accuracy and each requiring different validation techniques.

Problem-Solving in Data Quality Engineering

For anyone who enjoys problem-solving, both roles are a bit like detective work — figuring out issues and finding the root cause. If you like building Lego sets or solving puzzles, you’ll probably find this role a good fit, as they both require a keen eye for detail and endurance!

Integrating Quality Engineering into Data Engineering Squads

Another challenge was integrating quality engineers into squads that had yet to have them before. This step was essential to ensure that quality engineers were integral to the team and could collaborate closely with the data engineers/designers, sharing insights and feedback regularly. Given that we only have one quality engineer per squad, we prioritise risk-based testing and document these risks in a RAID log in each test plan per sprint. The test plan is reviewed with data engineers, designers, and product owners in a short call for approval before being shared with the full squad. As with most new changes, initially, there was hesitation, but this quickly faded as our quality engineers were able to demonstrate how their skill set could identify data inconsistencies that might’ve caused more significant problems down the line for the business.

Continuous Improvement and Future Plans

With the team now in place, there is always room for improvement. To address this, we regularly conduct blameless postmortems to help identify areas where we can enhance our test strategy and ways of working to ensure ongoing improvements in both quality and efficiency whilst we continue to develop scalable testing and validation solutions that can handle the dynamic nature of our evolving data.

As we only have one quality engineer per squad, we are also recruiting a ‘federated’ quality engineer. This role will support our existing teams, particularly where priorities in the sprint can shift focus, and the testing scope might be expanded or encounter delays. They will also help address gaps during busy periods, such as holidays or sicknesses, and they can also focus on automation tickets that can end up in the backlog due to time constraints within sprints. This approach will help address balancing the workload along with enhancing quality standards across the squads.

Building Expertise

Additionally, I have created a learning pathway(s) in Udemy for existing quality engineers in data and other colleagues from the Chapter(s) who would like to acquire new skills in testing data. This initiative will further build our expertise across the team and Chapter and support the company’s culture of learning and development.

Conclusion

Integrating quality engineers into our data engineering team has been a challenging journey that required careful planning and collaboration but has personally been incredibly rewarding! While the end goal of ensuring high quality in software development and data engineering is the same, the path to achieving this requires different strategies, tools and mindsets. With the team now in place, there is always room for improvement, and we will continue to refine our ways of working, conduct blameless postmortems to help identify areas where we can enhance our test strategy, and develop scalable solutions that can handle the dynamic nature of our evolving data.

If you are interested in joining us on our journey, please check out our careers page.

Thanks for reading!