Crafting SRE Business Value Alignment

Sarah Butt
Sarah on SRE
Published in
8 min readApr 21, 2021

One of the most common challenges in leading Reliability Engineering or Platform Engineering organizations is aligning to and quantifying the business value created by SRE efforts and projects. The research presented in this article is heavily based on the research presented in Westerman and Hunter’s The Real Business of IT: How CIOs Create and Communicate Value, and this write up serves as an effort to take the 4 step framework introduced in that book and apply it specifically to the SRE discipline. Additional insight for this paper was gained from numerous industry interviews in an attempt to create a set of principles that work for reliability engineering teams across various industries.

Before the public cloud era, many companies spent over half of their Capex dollars on IT-related equipment and projects. While public cloud computing has converted a significant amount of this spend to Opex at many companies, the sentiment remains the same — software engineering work and IT are expensive both in terms of raw dollars spent and the potential to add (or even subtract) value to companies. SRE is no exception to this, especially when transforming from an IT Operations to a Reliability Engineering organization. Thus, it is beneficial for every reliability engineering leader to communicate business value in terms understood by leaders inside and outside IT.

Principle 1: Challenge your thinking: It’s Not Actually About IT Or SRE.

If you put IT leaders together in a room, they will inevitably begin to “talk shop”. If you put SRE leaders together in a room, you’re likely to hear about the latest in auto-scaling technology, monitoring tooling, or CI/CD developments. To someone outside of technology, it might as well be a foreign language. When that same language is carried into leadership meetings with teams such as sales and marketing, it often falls flat. A fundamental mindset shift must be made: When speaking with audiences outside of IT or SRE, it’s not actually about the technology; it is about how the technology affects business outcomes and performances. A leader must be able to speak the same “business language” as other executives to truly become part of the conversation or else risk being seen as an “order taker” instead of a “strategic business partner”. This begins with orienting towards business outcomes and performance, not the machines that drive that performance. As Butch Leonardson once said while reflecting on how to best communicate with other disciplines, “We’re 99.99% on uptime, and we’re fast. The plumbing is wonderful… but nobody cares.” That is not to say that measuring uptime, industry-standard “Golden Signals”, or other SRE metrics are not valuable. They are useful and absolutely should be measured. However, while these metrics may be appropriate and helpful internally, such as in service reviews with Service Owners or discussing initiatives with other reliability engineering teams, they are not as beneficial when talking to a broader audience. Finally, when deciding to embark on a project or program (be it an IT Ops to SRE transformation or another program), it should be remembered that value comes when teams embark on business projects enabled by technology, not IT projects for the sake of IT projects.

Principle 2: Build Trust by Showing Value for Money

Westerman and Hunter define value as the right services provided at the right level of quality for the right price. For example, to an SRE leader, this may mean providing the right work (be it project work or incident handling type work), delivered to a level where both the SLA and error budget meet but do not overly exceed the business need, for the right cost. By showing the value of the existing money being spent on IT Ops and SRE before attempting to embark on significant transformations, reliability engineering leaders establish trust and creditability through transparency and delivering value. As the business unit leader at a F500 company once told an IT leader, “If you can’t run your own business, why would I let you touch mine?” In other words, if you don’t have the established credibility of providing value on existing investments, you haven’t yet earned a seat at the table to be transformative and drive long-term strategic shifts.

One of the ways to create transparency is to share data. A strong starting point for this is to turn SRE metrics into business terms. For example, instead of talking about the golden signals or uptime of services, talk about what customer journeys or business functions were affected if an error budget or SLA was missed. At JM Enterprises, they created a powerful example of this by creating a dashboard that tied standard IT and SRE metrics such as network uptime and service performance to transactional sales for customers and to business processes’ performance, which then linked to business unit performance and overall company impact. The dashboard updated in real-time and was displayed prominently in company offices. As Tom Holmes said of the effort: “Instead of saying that the servers are up, the routers are up, and so on it says that the contract sales are up.” This is a powerful effect as it ties reliability engineering effort to tangible business outcomes through a common language. Other companies, including Dell, have created similar dashboards to help translate IT effort into tangible impact.

Another way to foster trust and transparency is to report on business metrics. For example, when displaying cost, instead of providing a single number for chargebacks, give a breakdown of chargebacks that includes details such as fixed cost vs. consumption. It may also help to provide metrics to benchmark to peer organizations, showing that the value for the money being delivered is in alignment with industry standards.

Finally, it should be noted that showing value for money applies best to “Run the business”, or Horizon 1, activities. For any leader seeking to embark on an IT Ops to SRE transformation, you must not get stuck at this stage. Being only a Horizon 1 play, running the business without helping grow or transform it, can quickly turn “value” into a race to the bottom, leaving you with a well-managed and financially sound organization that is fiscally responsible but lacking innovation.

Principle 3: Link IT/SRE to Business Outcomes

The way to avoid turning an IT organization into a value “race to the bottom” is to affect not only Horizon 1 activities but also Horizon 2 and 3 activities. That is to say, to become involved and show demonstrated impact in not only run the business activities but also grow the business and transform the business activities. In The Real Business of IT, this is described through a framework called Westerman’s Virtuous Cycle. I will present a consolidated version but encourage anyone seeking a fuller understanding to read Westerman’s work on the topic. The consolidated framework is a circular process: needs identification > transparent investment > change (business process redesign, organizational change, application development, etc.) > harvesting.

Step 1: Needs Identification

Needs identification takes the form of knowing both the company’s business strategy and what key metrics business leaders are reporting on. This discovery process allows a deep dive into key company initiatives, pain points, and metrics. Understanding these needs in a company-specific context is vital to fostering alignment and creating customized programs and solutions that help grow and transform the business. Gartner’s Business Value Framework is another valuable source for developing this analysis.

Step 2: Transparent Investment

Transparent investment refers to the criteria used to decide what projects to invest in. There will always be a “buzzy” tool, technology, or concept. To avoid being tossed about by the hype waves, it is critical to clearly define investment criteria. Investments should focus on finding new sources of value, where value is determined based on the needs and priorities identified in step 1. The Four Sources of New Value from IT framework, which zones initiatives based on Source of Value x Scope of Change, is a useful framework for this effort. The Four Zones are: Optimizing (using IT to streamline processes via efforts such as automation), Reshaping (improving business performance by reducing pain points), Internal Informing (providing information and data analysis that affects how other internal teams act), and External Informing (providing customers with information other companies can not offer). Once a project or initiative’s value has been identified in one of these zones, it should be evaluated against standard criteria. At Intel, all projects are assessed in a 6 section grid along the Business Value and IT efficiency axis. Projects are given a positive, negative, or neutral rating for both categories and placed appropriated on the grid as a result. Projects identified as “win-win” (positive business value and IT efficiency) are put through further scrutiny in the form of a standard questionnaire. This questionnaire provides composite scores from 0–100 in the areas of IT efficiency, business value (a measure of strategic alignment), and financial attractiveness. This allows all projects to be placed on the same x,y graph with an additional metric of financial attractiveness dictating the “bubble size”. From there, strategic decisions can be made. For further reference, see Intel’s 2004 paper Managing IT for Business Value. Similarly, a leader from another prominent hardware manufacturer described a process in which they gave each potential project “t-shirt sizing” on various metrics, including revenue impact, margin impact, productivity impact, the likelihood of success, financial risk, etc. They then weighted each of these factors relative to the current business needs and priorities (discovered in step 1) and produced a composite score to allow projects to be stack ranked. A data-driven decision on which projects to pursue could be made based on the standardized ranking, and alignment with the broader business strategy was ensured.

Step 3: Change

The topic of change is overwhelmingly broad and out of scope for this paper. Keep a lookout for an upcoming article focused solely on change, including Kotter’s model for change!

Step 4: Harvesting

Harvesting is a period of accountability, reflection, and celebration. One of the essential parts of harvesting is to measure the value delivered. Without measuring value, the focus turns to IT cost, perpetuating the “race to the bottom” warned of in Principle 2. During a harvesting phase, SRE leaders should work with key transformation stakeholders within the broader technology organization and areas such as finance or marketing to quantify the value the program or effort has delivered. This value should be expressed both in terms of the original framework used to make the initial investment (accountability) and KPI’s that are meaningful to the company (as identified in Step 1). While absolute quantitative numbers are normally best, in some cases directionally correct numbers, relative percentages or multipliers off a baseline, and sizing groups can also be used. For extended duration projects, there should be both periodic harvest reviews (to help perpetuate the “fail fast” mentality) and completion of harvest review. Using principles from blameless postmortems can be beneficial in these reviews and can produce future learnings (reflection). Finally, achievements should be documented, shared, and celebrated. This allows for team member recognition and for SRE orgs to communicate the value their initiatives have brought to the broader business. Much like a sales leader sharing and celebrating after a good quarter, technical organizations should celebrate and evangelize their wins. As these wins accumulate, the broader IT org is more likely to be seen as a strategic business component. For example, in the case of Nordstrom’s shared in Gene Kim’s book Accelerate, a DevOps transformation that promoted faster releases to production allowed for significant business agility, especially in the online space. While the SRE/DevOps spend increased during this project, the value delivered by the teams far outweighed the cost. By clearly aligning to the broader company strategy and communicating the value of a CI/CD implementation in business and customer-centric terms, the Nordstrom’s team was able to clearly show the benefits of “managing for value” instead of “managing for cost” and including DevOps teams in growth and transformation initiatives.

The Virtuous Cycle is just that, a cycle. Each completed cycle (and each fast failure handled responsibly) builds momentum. By reframing thinking, building trust by showing value, and linking to business outcomes, SRE leaders can clearly quantify and demonstrate value throughout an organization.

--

--

Sarah Butt
Sarah on SRE

SRE Strategist // Technical Product Manager // MBA Candidate