The Secret to Speed and Stability

Omer Meshar
CyberArk Engineering
6 min readMar 3, 2022

Every product team wants to improve both their speed of development and their production stability. We all want to make fast changes, while making sure our site is reliable and continues to serve our users. The question is: How do we do both? The answer, believe or not, comes from a sport called “Matkot.”

Speed and stability in a nutshell

In my previous post on DORA metrics, I explained why it is important for software development teams to measure the four “DORA metrics” to understand the performance envelope of the team and identify where the team needs to improve.

The first two metrics, Deployment Frequency and Lead Time for Changes, collectively measure velocity or Speed. The other two metrics, Time to Resolve and Change Failure Rate, collectively measure Stability.

Why development speed and production stability matter

Speed is very important because the faster we can get something new of value to our customers, the better we serve them. And the more the better — releasing a new feature once every quarter is not comparable to releasing several features every week.

Stability is critical because as long as our production is not reliable, our customers will not be able to use our product as intended — and might even lose their trust in it.

Speed vs stability?

Speed and stability may seem to be in conflict. If you want to keep your production stable, it would be wise to avoid making frequent changes, right? Also, having faster cycle times often means doing fewer regression tests, or tests in general, leaving room for instability.

It seems as if there is a trade-off between speed and stability, when the operations team would hold back on changes to gain stability, and the development team would increase their velocity by putting less focus on the reliability of their product.

However, one of the most important insights coming out of DORA’s research is that speed and stability can, and must, be achieved together to obtain elite performance.

Many professionals approach these metrics as representing a set of trade-offs, believing that increasing throughput will negatively impact the reliability of the software delivery process and the availability of services. For six years in a row, however, our research has consistently shown that speed and stability are outcomes that enable each other.

State of DevOps — 2019

The question then is how can you improve both speed and stability simultaneously?

Introduction to Matkot

Matkot is a well-known beach sport in Israel that is easy to understand. Typically, a team is comprised of two players — partners — who play the game with wooden rackets and a small ball, trying to pass the ball between them as many times as possible. Good teams can do that from a distance of 10 meters or further, while mediocre teams usually settle for less distance.

Matkot on the Herzeliya beach, photo by me :-)

The secret to Matkot

During my latest visits to the beach, I watched in amazement as a few dozen pairs playing Matkot achieved a lot of complete passes between them. It was clear they had mastered the game, and I noticed that most of them used the same tactic.

Their tactic was that one player was oversaw hitting the ball as fast as possible, while the other player responded with a much less intense hit, ensuring the ball would come back to the first player, enabling that player to hit it as hard as possible.

One player was oversaw charge of the ball’s speed, while the other player made sure of its stability.

Side note: When my wife and I played, we weren’t as good. It was clear we did not implement this tactic, and instead, both of us tried to either stabilize the ball or hit it hard, making it very difficult to continue for a long time.

What does Matkot have to do with software development?

The player in charge of the speed of the ball represents the DevOps role. The DevOps engineer enables the product teams to reach high speed and short cycle times. They improve the foundations of the pipelines, making them lighter and faster.

On the other hand, the player in charge of the stability of the ball represents the SRE (Site Reliability Engineer) role. The SRE controls the production environments, enabling the product teams to deploy into production and monitor their services. They improve the production infrastructure, making it more stable and reliable.

The ball in this analogy represents the product development — on one hand gaining speed from the DevOps engineers and on the other hand gaining stability from the SREs.

Don’t get me wrong; a DevOps engineer is also concerned about the stability, and an SRE will also take speed into account. More on that later.

The DevOps engineer on the left, the SRE on the right…

Putting it all together

The roles in action

To better explain the difference between the roles, let me give an example of their work within CyberArk.

At CyberArk, we have DevOps engineers scattered in many product teams and also have a centralized DevOps team that works on frameworks. Our SREs are usually a part of the Cloud Engineering team, which also includes a few DevOps engineers.

Recently, our centralized DevOps team worked on a project that enabled the different product teams to create fast, customer-like environments for their development and testing needs. This significantly reduced the time it takes for the product teams to develop and test their new features.

Our Cloud Engineering team, consisting of the different SREs, has been working on gathering all production indicators into one system, allowing better monitoring and tracking of our services. Our product teams are now able to identify production impediments earlier and handle them faster.

The DevOps and SREs Partnership

One important insight we can gain from Matkot is that DevOps and SREs are partners.

While the SRE is in charge of our MTTR (Mean Time to Restore), the DevOps engineer plays a key role in enabling fast deployments of production fixes.

While the DevOps engineer is in charge of the pipeline, the SRE is the one taking a close look at deployments to production.

These roles must learn to work together, allowing both speed and stability to accelerate.

Switching roles

Another insight we can gain from Matkot is that there are cases in which roles are switched.

In Matkot, there are instances when the partner in charge of speed will switch to the stability role and vice versa — depending on the wind, the situation and the current need. Good partners know how to do the switch quickly, adapt and switch back when appropriate.

The same goes for SREs and the DevOps engineers. They, too, need to switch things up from time to time and take into account current needs. Different product maturity, pipeline maturity and production needs may cause an SRE to concentrate on speed and the DevOps engineer to focus on stability. Here as well, good partners will know how to ensure that both speed and stability are considered, even if the roles are not black and white.

Improving both speed and stability

It goes without saying that product teams are in charge of their services, including their speed and stability. They are the ones who need to make sure their service is reliable enough and be able to deliver quickly, creating value for their customers.

The DevOps engineers, as well as the SREs, are key to obtaining and improving both speed and stability. As part of a product team, they help ensure the pipeline is fast and the production environment is taken care of. Together, they provide the framework for a better development surrounding.

When reaching a certain level of scale, having central teams that provide a framework and infrastructure is necessary. Having a team in charge of enabling speed, while the other team is enabling stability, and having them work together, is an excellent way to push the teams forward, toward high and elite performance.

--

--