Heritage Reliability Engineering: How to Make Heritage Legacy More Reliable

Published in

The Humans of DevOps

5 min readJun 3, 2020

Collectively, we are witnessing a time of unprecedented change and disruption in industry. The world is moving towards radical systematic change which is going to influence every sphere of our lives. It’s not only a technical change where organizations are changing processes and infrastructure but this is also a cultural and behavioral change. Transformation, cost savings, user experience are targets for every industry, no matter how small or large, IT or non-IT, newbie, experienced or legends.

At the same time, a negative narrative has emerged around the legacy world. Too many people are using the term ‘legacy’ with negative connotations: that it’s expensive and poorly adapted to modern agile and DevOps ways of working. Also, whenever an organization evolves to use agile and DevOps principles, the legacy workforce starts feeling isolated and slow in nature to gain agile and DevOps capability.

Ultimately, this results in anxiety and fear in the legacy workforce — the baby boomers, Gen X and Y (Millennial) with legacy skills start leaving the workforce. Gen Z has shown no willingness to work for legacy; this happened because everyone wants to be Google and forgot the high load of their legacy.

But the reality is:

71 Percent of Fortune 500 Companies have tons of legacy
Legacy handles more than 85% of credit card transactions
Legacy still has 68% of IT world production workload

Here I am not advocating legacy; the ultimate goal is to achieve a cost-effective way to run production and provide the best-in-class user experience along with a healthy collaborative culture — this will require a modern approach to deal with heritage systems. Because the heritage of the past is the seed that brings forth the harvest of the future, it’s time to apply digital thinking to large heritage services that still serve a huge volume.

Heritage Reliability Engineering can help here! It’s inspired by Site Reliability Engineering and adds focus to the effort involved in making existing legacy services more reliable.

Heritage Reliability Engineering is an approach to operations that ensures that continuously delivered applications run efficiently and reliably by using software engineering and automation solutions. The key concept is engineering, which includes a data-driven approach to operations, a culture of automation to drive efficiency and reduce risk, and hypothesis-driven methodology in incident, performance, and capacity tasks.

Another core principle is a focus on improving things. Workforces and engineers don’t only ‘do’ automation and restore failed services. Their job is to make sure that failures don’t happen again. A blameless postmortem identifies the root cause, or causes, of an incident and results in a balanced action plan to address them.

Unlike traditional legacy operations who are typically risk-averse, heritage reliability engineers embrace risk in a controlled fashion. They use the concept of error budgets to determine acceptable risk and make informed decisions about when changes should be made. The error budget is a limit on how much time the system is allowed to be down, defined by the contracted service-level agreement (SLA) or the intended service-level objective (SLO). Error budgeting goes a step further and encourages testing and releasing only if downtime is left in the SLA. If a system has been unstable, changes are restricted; if it’s stable, engineers can take the opportunity to innovate or upgrade.

How to Transform Your Legacy Operations to Heritage Reliability Engineering

Identify opportunities to improve and modernize legacy software delivery using progressive practices like agile and DevOps
Develop a pragmatic approach for the adoption of progressive practices that build on existing initiatives and capabilities
Intensify agility requirements: legacy operations must accommodate new or changed code faster and more frequently while ensuring the quality performance of variable workloads.
Create an urgent generational shift: legacy operations experts are retiring, and next-generation operations professionals can’t magically achieve expertise using traditional processes and tooling and processes experts have relied upon
Go for shared ownership; remember you are in transition, with onone side a team developing microservices and on the other side a team holding up the legacy fort
Shared ownership is impossible without including every person, team, and organization
Establish service level objectives that are more relevant to heritage services, for example, the time it takes to process batch, CPU consumption by batch, transactions, payment success rate, and data quality.
Eliminate toil (value subtracting, repetitive work) by investing time in engineering — by increasing capability and training workforce in modern engineering approaches and by allowing the workforce to apply those modern engineering approaches to heritage services
Measure everything, Improve telemetry (how to measure remotely) to make more service operation data available. Keep track of any and all activities such as system health, behaviors, cost efficiencies, speed, and any other useful data. By making these heritage services more observable, you can spot potential problems before they cause disruptions
Stop being only reactive. Stop the traditional approach to react when a problem occurs, also stop devoting time to fixing problems manually and only when they arise, begin proactively understanding your business needs and aligning them with the goals of the rest of the organization. Effectively utilize your resources and talents to preempt problems from the ground up by monitoring, engineering, and automating — this way, less time has to be wasted on fixing issues one by one
Stop creating SPOFs (Single Point of Failures): When you transition, you will have removed yourself from being at the center of a lot of processes, so you won’t be the bottleneck anymore, and you will be needed in ways that add more value to the business and are more fun for you as well, being proactive, creating new tech, engaging more with the rest of the organization

In the end, HRE (Heritage Reliability Engineering) is much faster than any other progressive practice because of its nature to stay with operations. It can help improve the confidence in the legacy workforce and provide them an opportunity to transform themself with time; guaranteed to make your organization run smoother. By combining operations with software engineering, the two fields’ incentives can be more easily aligned. HRE remains flexible and malleable to suit a variety of needs, setting your company up for success and at the same time can convert the fear of legacy workforce into the self-motivated value-driven fire.

DevOps Institute is dedicated to advancing the human elements of DevOps success through the SKIL Framework: Skills, Knowledge, Ideas, and Learning. Learn more.

Originally published at https://devopsinstitute.com on June 3, 2020.

Heritage Reliability Engineering: How to Make Heritage Legacy More Reliable

Written by Advancing the Humans of DevOps