Embrace Algorithmic IT Operations in 2017
DevOps has altered the dynamics of infrastructure provisioning, managing applications build, and release processes. However, still at many companies, it is largely confined to configuration management and automated deployments, whilst a large chunk of day to day operational problems remains a sore point for engineers.
The rise of cloud, distributed architectures, containers, microservices have further increased the data overload as different systems are required to monitor and manage the new age applications. And with the ever growing amount of alerts, toolchain and automation scripting is inducing fatigue into engineers work. In the ideal world, every DevOps engineer should be focussing on Apps instead of Ops .
So, here is a thought, “What if Humans could solve new complex problems while we let Machines resolve known, repetitive, and identifiable problems?”
As DevOps community, I want to share how we can deliver Algorithmic IT Operations (AIOps) in our companies to reduce the stress and fatigued workload by eliminating alerts, repetitive events, improve business agility through intelligent management layers, and respond quickly to production incidents 10X faster.
- Adopt NoOps Philosophy: It’s very important within the engineering teams to adopt a Culture of NoOps, which essentially means, saying NO to manual operations. It’s important to nurture a belief that “machines should solve known problems and engineers can focus on solving new problems.”
- Deploy Automated Actions for Known Events: Anybody who managed production infrastructure, business services, applications and architected systems, knows that most of the problems are caused by the known events or identifiable patterns. You should encourage them to deploy automated actions (response mechanisms) for known events with business logic embedded so team can sleep peacefully and never sweat again.
- Create Diagnostics for Operational Issues: When events or alerts are triggered, most of the current tools just provide a text of what happened instead of providing a context of what is happening or why it’s happening? So as DevOps engineers, it’s important for you to create diagnostic scripts or programs so you can get a context of why CPU or latency spiked? Why application went down? Essentially, to get to the root cause faster.
- Use Code as a Weapon for Cloud Operations: The only magic wand for solving operational problems is to use code as a weapon for solving them. As a team and DevOps engineer, you need to focus on using CODE as a mechanism for resolving problems. If you are building the CI/CD today then you should certainly deploy a trigger as part of your CI/CD pipeline that can monitor deployment for health metrics and invoke a rollback if it detects issues. Simple remedies like this can save hours of time after every deployment and handle failures gracefully!
- Adopt Intelligent DevOps Tooling: The world of using static tooling for deployments, provisioning, packaging, monitoring, APM and log management is over. With adoption of Docker, microservices, cloud and API driven approach to deploying applications at scale, and ensuring high reliability, requires a different take. So it’s important to use the intelligent tools for cloud management instead of trying to reinvent the wheel every time. I believe with rise of ML and AI, we will see more DevOps tooling vendors incorporating intelligence into their offerings for further simplifying the work of engineers.
Let’s say, the monitoring tool will use dynamic threshold approach for raising alerts based on history of observations instead of expecting users to configure threshold, Wouldn’t it be awesome for the engineers ? I really hope more vendors incorporate intelligence into their offerings.
At Botmetric, we are excited about working on building an intelligent event-driven platform for managing incidents and operations in the Cloud world. We are building Botmetric as a platform that can handle most of the operational problems for engineers using application discovery, alerts data, cloud configuration, historic patterns and known events. We believe, it will be a platform that helps our customers move from DevOps to NoOps philosophy by bringing Algorithmic IT Operations for incident management in the Cloud.
We are looking for passionate engineers to also join us in this journey, drop a note to email@example.com if you are interested!
P.S: The original post was published on LinkedIn