Agile Data Engineering at Miro

This is how we do it: at Miro we leverage Agile concepts to drive data engineering projects. We get improved predictability, and a smoother validation process with our stakeholders.

Ronald Ángel
Miro Engineering

--

Introduction

Miro is in hypergrowth. More people use more data to inform more decisions, and they ask for more new features and functionality. The Data Engineering team is growing, too, and we're modernizing the Data Platform. As Data Engineering, we constantly look for the sweet spot between running the shop efficiently, and adding new capabilities. For this reason, it has become crucial for us to deliver based on priorities, and to iterate fast.

We're taking you through our journey: from identifying our problems, to achieving better results. We're closing with some takeaways about the rewards we received as a result of applying Agile software delivery practices.

Data Engineering

Data Engineering builds and optimizes the data flow of an organization via data pipelines. What we do ranges from adding new integrations with external systems to creating a data lake from scratch. Tasks vary a lot in size and complexity, which makes it hard to plan and set expectations for delivery. This causes frustration:

  • Conflicts with other teams’ priorities can block us.
  • Some requirements are unclear, which slows things down.
  • Very large tasks take a long time to deliver, which diminishes their value.

As a growing team, it felt natural to us to have a conversation about how we wanted to work to ensure we deliver what's needed. During the process, we identified some useful pointers that we thought might be helpful to other people, too.

1. Understand that data is different (#Empathy)

Data Engineering can be done iteratively; delivering incremental value to your stakeholders and getting feedback is crucial. It may sound like a standard Agile approach, but there’s this thing called “the data” that brings its own peculiarities and challenges, such as:

  • You don’t have access to it.
  • You don’t have a clear idea upfront of how it will perform on the existing infrastructure (data skew anyone?).
  • You don’t have a way to measure its quality for the intended use case.
  • You don’t have a clear idea of what compliance requirements should be applied.

These might look like obvious points, but not being explicit about them will cause time waste and delays. We prioritize understanding our data and limitations to focus on the ultimate goal of creating valuable tools for our stakeholders.

2. You need a plan (#Collaboration)

Any Agile methodology requires planning. It’s much better to find issues at the planning stage, than during development. Data Engineering has a long-term roadmap defined at a high level that is aligned with the company’s OKRs. Then, when we plan specific stories, we collectively define the scope of work and set expectations on delivery, trying to always keep iterations short. Whether you are using Scrum or Kanban, regular iterative delivery is crucial. We all know that software estimation is an unsolved problem; iterating faster can save time and money.

After defining what to work on, you should be able to better identify possible blockers. Sometimes, it’s easy to identify and address blockers; for example, you need DevOps support to make an infrastructure change. However, not all blockers are so easy to solve. For example, a task may require the team to identify and understand the most appropriate solution. Handle blockers like you handle software delivery: define the scope, draw a clear definition of done, and deliver incrementally. Understanding your blockers first enables a more confident delivery process.

3. Reduce time to feedback (#Iteration)

It’s tempting to supersize your Data Engineering delivery and follow a more Waterfall-like approach. Reducing time to feedback means that you prioritize learning if you are doing it right, and if you can do it faster. Data Engineering work can be delivered and validated iteratively. To do this, the whole team has to understand what work needs to be done, and what it means for it to be done. There are many approaches to assess this; for example, story kick-offs and having a definition of done.

Therefore, together with the stakeholders, for each task, we define the scope of work, and the approach to iterate and validate the outcome often and incrementally.

4. Understand the need (#Impact)

Data Engineering’s mission at Miro is to empower everyone to make data-driven decisions. Blindly following this mission with no sympathy for how we deliver value to achieve the company's objectives would be unwise. By combining both, we can understand the amount of effort required to contribute with the value we aim to deliver. For example, a proof-of-concept doesn't need to be perfect; it can have some limitations to prove the value faster. Conversely, when building a data pipeline to deliver a key business metric, you want to achieve a higher level of quality, while keeping a steady iterative delivery pace.

Final remarks

While Data Engineering is different, there are still many lessons we can learn from other areas of software delivery. Applying the principles we discussed in this article is helping us scale and deliver a maturing Data Platform to a growing organization like Miro. We'd be glad to hear the lessons you learned while doing Data Engineering.

At Miro, we:

  • Play as a team to win the world (#collaboration)
  • Practice empathy to gain insight (#empathy)
  • Focus on impact and make it happen (#impact)
  • Learn, grow and drive change (#iteration)

If you'd like to join and help us build Miro’s Data Platform, check out our open positions.

--

--

Ronald Ángel
Miro Engineering

Software Developer focused on Big Data and Distributed Systems. @Amsterdam