Kubernetes Rant: Why DevOps/SREs Need To Empower Themselves With Application Knowledge?.
This RANT is based on tailwinds.ai interactions with SREs/DevOps teams across different organizations. Few common trends we notice are
DevOps/SREs are either part of one team or folded towards a single hierarchy. So would have used DevOps & SREs interchangeably in this blog.
DevOps/SREs typically tend to be passive and allow application teams to override them on many deployment design choices.
In this blog post, I would like to make a case as to why SREs need to empower themselves with knowledge of applications for running the infrastructure better.
Before getting on to the crux, would like to get into few definitions.
Functional vs Non-Functional Aspects.
As the picture says it all, Functional aspects focus on features, business logic workflows, etc while non-functional aspects are system attributes such as security, reliability, performance, maintainability, and scalability, etc.
Not only that, both are interconnected entities as shown in the picture Figure-2 below.
Software Architects vs Deployment Architects.
Software architects are those who are concerned about functional requirements while deployment architects are also called Devops/SREs who worry about non-functional aspects of the application.
In many small organizations or small teams in large organizations, both roles are typically played by a single architect.
Architects In Olden Days.
In the good olden days, there were no SREs but only software architects who pretty much did everything so took care of both functional and non-functional requirements.
For example, the architect’s job is to understand the requirements, make sure to come up with the right design that is scalable & extensible, automate the requirements, keep eye on security, plan for failures among many other aspects. In short, a software architect had to understand both functional and non-functional aspects while designing the application.
This model was fine as the applications consumed all the functional and non-functional aspects and while not many innovations were happening in underlying infrastructure from the application deployment perspective.
While there were no SREs but then there were compute, network & storage administrators, whose primary job was to set up the infrastructure and make sure they are up and running all the time.
Fast Forward Today.
Many innovations are happening on both the software and deployment side.
Applications are becoming more and more distributed by breaking into smaller components called services. These services are laser focussed on the features and business logic otherwise called functional aspects while platforms like Kubernetes provide tools for many non-functional aspects such as availability, scaling, logging, monitoring, security, blue-green deployments, etc.
What we observe in many organizations is that the software architects who focus on functional aspects miss out on innovations happening on Kubernetes' side of the story. Therefore, continue to have non-functional bits in the application making it heavyweight, bulky, etc.
Add hiring difficulties around SREs in the mix or sometimes organizations don’t see a need for experienced SREs, software architects make deployment decisions leading to inefficiency and missing out on many key aspects.
SREs are responsible for the non-functional requirements of the application. So they will have to equip themselves to understand the application from the perspective of scaling, reliability, security, resources consumption, etc.
SREs will have to get involved in the design decisions of applications as non-functional requirements leads to functional aspects as per Figure-2.
Also, SREs will have to be a lot more assertive given that platform(Kubernetes) and non-functional requirements are their responsibility.
Here are few pointers to get started on understanding the application.
What type of workload it is?
What are the components of workloads and how do these components interact with each other?
What are the reliability and availability requirements?. Basically scaling, HA, performance aspects, etc.
What are the resource requirements for the workload?
How to secure them?