Playing Zone Offense: SRE Within Moore’s Zones Model

Sarah Butt
Sarah on SRE
Published in
8 min readApr 21, 2021

Geoffery Moore is perhaps best known for Zone to Win, his breakthrough work describing how established companies can harness the power of innovation, remain competitive in rapidly changing markets, and prove sustained value by catching the right innovation waves. As part of his theory, he recommends that mature organizations be divided into four zones, with a unique playbook appropriate for each zone. A brief description of Moore’s Zones’ is below:

Graphic of the mission and focus of each of the four zones. Source: https://1p0dc61tan859681x151aldl-wpengine.netdna-ssl.com/wp-content/uploads/2019/05/4-zones.jpeg

In traditional organizations, technology systems are often maintained by IT Operations teams, normally aligned with the playbook of the productivity zone. These organizations generally apply a standardized approach to managing operations and keeping systems up and stable. Site Reliability Engineering, or SRE, is a relatively new school of thought initially formally developed at Google that has spread throughout the technology industry, with many organizations choosing to modernize their IT Operations teams by either transforming them into SRE teams or implementing a hybrid model with both IT Ops and SRE. In many ways, SRE came in response to the DevOps movement, an engineering culture first introduced by leaders such as John Allspaw, Gene Kim, and Patrick Debois. DevOps focuses on capturing synergies by unifying development and operations teams to enable IT agility and collaboration. When DevOps is applied in development teams, the principle of Service Ownership is often discussed. When DevOps is applied to traditional IT Ops, the natural progression is SRE, or the concept of applying software engineering thinking to IT Operations to increase efficiencies, reduce toil, and improve system reliability through engineering.

Organizations, particularly large organizations, often struggle to fully mature their teams from legacy IT Ops principles to mature SRE organizations. Additionally, an SRE transformation is often a multi-year process, and it can be difficult to quantify the concrete business value created by the transformation. Understanding the business value is vital to project success and leadership buy-in. As Westerman says in his book Leading Digital, “IT leaders need to have a mindset that extends beyond technology to encompass the process and drivers of business performance.” This article will examine how Moore’s Zones enables SRE leaders to align strategies and value to business needs in each of the zones.

SRE value is normally in one of two veins. The first is improved operational efficiencies or toil reduction (values consistent with Moore’s Productivity Zone, also called the “managing for value” concept). The second area SRE can provide value is in sophisticated engineering to improve a complex system’s availability, performance, and reliability, greatly contributing to the end user’s perception of meaningful availability, referred to as reliability.

Why is reliability important? Reliability is considered a non-functional requirement, often competing with traditional features for development cycles. However, reliability should be the foundation of every system — at the end of the day, it doesn’t matter how wonderful a product is; if it is unreliable, users will be unhappy, and buyers will become wary. In general, once a customer has selected a vendor or product, they don’t desire to change solutions. Change is disruptive to organizations and results in friction and lost productivity. However, if the system or product is down or degraded so often that it creates more friction and user pain than a possible change, users will begin to consider leaving a product. Customers don’t want to leave a product, but, across all industries, availability is one of the main reasons they will.

Reliability has also become increasingly important as it is vital to a customer-centric digital transformation. In Leading Digital, Westerman explains how technology has created a business transformation in all industries, not just traditional technical ones, with so-called “digital masters” performing significantly better than their peer companies, including up to 9% higher revenue and 26% higher profit (Leading Digital, pg 18). For example, look into Starbucks, Nike, and Asian Paints. As a result, “anything that interferes with a compelling customer experience must be corrected immediately.” From creating more robust systems that degrade gracefully, managing major outages, and resolving lingering performance degradations that create customer pain points, SRE and service ownership disciplines can directly support customer-centric digital initiatives that increase revenue and margin.

Traditional ways to align SRE to business needs, expectations, and values include the concepts of error budgets and SLOs. To best support an organization’s needs, it may also be useful to break SRE out of the productivity zone and align SRE teams to each of Moore’s four zones’ specific needs. These zones align to the horizons in McKinsey’s Horizons model, designated in parenthesizes. A brief discussion of the alignment of developer experience and business value within the Horizons model can be found in Gene Kim’s The Unicorn Project. As seen in the graphic above, the four zones are:

  • Performance Zone (Horizon 1, Sustaining Revenue)
  • Defined as: Existing products and core competencies currently driving revenue
  • Transformation Zone (Horizon 2, Disruptive Revenue)
  • Defined as: Competitive response. Ideal for neutralizing, optimizing, and differentiating offerings based on products promoted from the Incubation Zone and competitive responses to changing market conditions
  • Incubation Zone (Horizon 3, Disruptive Investment)
  • Defined as: Innovative early-stage ideas, research and development
  • Productivity Zone (Horizon 1, Sustaining Investment)
  • Defined as: Supporting operations and functions

The performance zone hosts established products with known customer adoption and revenue generation. For these products, external customers’ experience should be paramount, and SRE teams should focus on the reliability of production systems, particularly regarding customer-impacting issues. Additional focus should be on having supportable and well-observed production systems. Stability should be focused on over speed, and factors such as compliance and change control/change management for end-users will often be important. Processes may be implemented to encourage structure and promote reliability.

The incubation zone is where early-stage concepts, ideas, and prototypes are nurtured. This zone is often referred to as “managing for growth.” In this zone, speed is essential as it is vital to prepare viable products to bring to market as quickly as possible. Even a few days delay can create a significant impact in terms of finances and market share in a competitive market. The guiding theme for SRE in this zone should be speed — creating an environment where developers can work fast, deploy fast, and fail fast. Additional focus should be placed on having sufficient capacity and process to allow hardware for new products and ideas to be quickly commissioned, spun up, and potentially spun down and repurposed. In this zone, developer experience and stability of non-production environments for testing may be more important than the stability of the applications being developed. Even in a company that is traditionally hesitant of CI/CD practices, this is the zone that may benefit most from fast and frictionless deployment strategies. Developer experience is vital in this zone because a slow developer experience or “death by 1000 cuts” can slow innovation and prevent a first to market success. Even within this “build fast and fail fast” culture, a focus should be on creating well-architected systems that adhere to agreed-upon architecture principles and eventually scale as needed without significant rework.

The step between the incubation zone and the performance zone is the transformation zone. This zone is where products successfully graduate from concept to preparing for commercial markets. During this stage, SRE teams should focus on ensuring the applications are ready for a stable launch. As features may still be in testing, having capabilities to feature toggle, A/B, canary test, and potentially even a blue/green environment approach provides the most business value. This is also the transitional time where change management, compliance certifications, and other production readiness work should be done. Speed and agility are still valued in this zone, but preparing the application for full customer load becomes equally important.

Businesses will also have non-customer-facing functions such as HR and payroll systems. These systems are considered part of the productivity zone. In this zone, SRE teams should value cost/value, reliability, and meeting known needs.

As applications or services move between the zones, their SRE requirements, needs, and priorities may change. Leading Digital introduces this concept in the form of “multi-speed units” within a broader organization. An early concept in the transformation zone may require strong support from a developer experience SRE team. Still, they may not have a dedicated SRE team for their product, instead relying heavily on service ownership with an experienced SRE team acting in a “consulting” role on best practices for architecting for reliability. As a product moves to production readiness and moves ahead on the horizon, it may require increased dedicated SRE support. The product should also be significantly more stable as it prepares to run in production with an end-user load. As products are promoted between zones, budgets/HC/people may need to move with them, and headcount may need to be expanded (potentially funded by the business). This helps prevent the traditional pitfall of “lobbing” a new application over the proverbial fence into production for an IT Ops team to support with little or no new resources or funding.

As noted by both Moore in Zone to Win and Kim in The Unicorn Project, the production zone, otherwise known as Horizon 1, will “eat’” all of the resources if you let it. To this extent, it is advisable to have separate SRE teams (reporting into a broader SRE organization) to avoid competing priorities and allow for better alignment to the distinct business needs in each of the zones. As Westerman notes, this zoned approach also reduces natural tensions between the different cultures of each zone/horizon, such as controlling <> innovating and orchestrating <> unleashing.

In SRE work, it is easy to focus on the performance zone. Issues in production, particularly critical incidents, are often all hands on deck affairs with significant and tangible immediate external customer impact. Moore strongly emphasizes that companies need to focus on not just the performance zone, where Horizon 1 products exist, but also the “pipeline” of new product development in the transformation and innovation zones (Horizons 2 and 3). Failure to do so will slowly lead a company into obscurity (the classic cautionary tale of this is Kodak, an example of this in the tech space is Yahoo). To this end, SRE organizations should seek to understand which zone the products they support are in and the business needs, behaviors, and prioritizes of that particular zone. It should be noted that when equal attention is given to the performance zone and innovation zone, production and non-prod environments become equivalently important, and developer experience such as the reliability of internal tools becomes as important as the reliability of external products. Adopting this approach allows balance that supports both the company’s immediate revenue-generating products and its long-term success, which depends on the company catching the “next wave” as it progresses through the zones.

Out of this article’s scope is a discussion of SRE need and potential optimizations of the productivity zone and the concept of core vs. context. The final chapters of The Unicorn Project cover this in detail. It is useful to note that a helpful tool for a core vs. context and potential optimization and outsourcing analysis is Wardley Maps.

--

--

Sarah Butt
Sarah on SRE

SRE Strategist // Technical Product Manager // MBA Candidate