This is an overview for those organizations starting the transition from DevOps to a Site Reliability Engineering integration.
The digital economy has transformed the outlook of all financial transactions, and DevOps is seen as an active contributor in this new paradigm. There has always been a need for faster deployment of IT infrastructure. Organizations should consider the implementation of DevOps for accelerating application delivery. There is also another approach promoted by the IT professionals for the management of IT infrastructure that is popularly known as Site Reliability Engineering (SRE). SRE follows the same core principles as that of DevOps; however, its significance is seen mainly in large scale frameworks such as Google.
Let’s take a closer look at DevOps
DevOps follows the agile project development methodologies as opposed to a typical waterfall approach. The teams in DevOps are focused on continuous delivery. The useable digital products are delivered iteratively, and automation enables it a reality for the enterprises. The approach bridges the gap between the domain of operations (Ops) and the domain of development (Dev). The tasks of programming and delivery fall under the domain of development teams. The role of the operations domains is to monitor the tasks and maintain the stability of the platform. The Operations (Ops) puts pressure on Development (Dev) to work beyond the routine tasks of testing and quality assurance. Dev responds to this pressure by delivery the product in releases and phases. These phases are also termed as sprints in some methodologies.
An automated approach to agile software development enables Dev to focus on developing new functionality considering the inputs received for the already developed prototypes and working models. The foundational principles in DevOps promote three aspects. The first area is the continuous delivery of artifacts with high release frequency. The second aspect is to ensure an automated approach to development. The third area of consideration is a shared responsibility. The responsibility of the whole project is shared between Ops and Dev. The method not only enhances the potential of the IT department but also creates an enabling environment in the organization for the accomplishment of tasks. A successful move to DevOps culture shifts the focus in corporate thinking to inspire the change strategy, to reduce the resistance to change, and to take the ownership of the project by all stakeholders.
Organizations run multiple applications and modules, and there is a requirement for creating an integrated environment for all applications. DevOps serves the purpose of a glue that holds all the components together through an agile and lean approach to project development. A significant benefit of DevOps is the faster delivery time to market. Eventually, it increases ROI because the applications are more frequently released and the buyers get highly usable digital products that are ready to be distributed to end users. DevOps develops a culture of mutual trust and cooperation where Dev and Ops work as a cohesive unit and contribute in a progressive momentum.
Enters Site Reliability Engineering
SRE, on the other hand, differs in its approach and permits the developers to construct the framework as well. The concept originated at Google when an engineer named Ben Treynor was assigned the task of supervising a team at Google. The team had the responsibility of improving Google’s site reliability. The team concluded in its analysis that no current architectural method meets the requirements of such large-scale systems. The scenario necessitated the development of a new methodology that gave birth to SRE. The proponents of SRE argue that traditional Ops and Dev may end up in conflict because of their different motives of releasing software to production. The Ops team is more interested in the smooth running of the software without outages. However, the Dev team is more inclined to launching new features that could see wider adoption by the end users.
SRE is a more scalable approach than DevOps.
SRE teams distribute their working time to different assignment tasks. Almost half of the time is devoted to operations calls. The remaining time is used for software development, configuration management, and adding program features. Although SRE is considered as an extension of DevOps; however, there are several marked differences between the two approaches. A significant difference is that SRE is an operationally driven methodology. In SRE, the operations are treated as a software problem. DevOps focuses on a culture of goal and initiatives alignment between departments. SRE aims to reduce departmental communications and instead focuses on the inputs of team-lead engineers. Those engineers are selected who have an Ops mindset and background because the SRE team also handles the tasks of Ops. Site reliability engineers focus on a holistic understanding of the systems and the relationships between different modules and applications. The interconnections of the systems are given as much attention as the components themselves. Active management is ensured by considering the service level objectives. The cost of failure is reduced by addressing the issues proactively. Since the operational control also lies with the development team, the boundaries are reduced, and the processing is expedited. It is not possible in SRE to use a different set of tools in different units. The divergence is eliminated regardless of the job function or job title.
SRE is a more scalable approach than DevOps. It creates a balance between continuous development and continuous improvement. When there are requests of modifications to existing programs under the SRE approach, they are owned and honored by the development team. However, they are considered and implemented in the next releases of the project. The approach of SRE provides benefits to those businesses that have to manage large scale systems. The organizations that have embraced this framework include Google, Netflix, and DropBox. The key to success in SRE is to have the availability of experienced professionals who could operationally lead development teams.
While there are significant procedural differences between SRE and DevOps, both approaches share some common ground and features. Both attempt to eliminate the number of organizational silos. In both methods, the failures are expected, and risk management strategies manage and mitigate the risks. Gradual, incremental, and iterative approaches are relied on in both approaches.
Other common grounds include the use of automation and monitoring success, which I will address with my two cents on another post. Until then, I’ll see you online.