Application Monitoring @ IF

Oscar Wånggren
If Technology
Published in
5 min readNov 22, 2022

At If we believe in DevOps.
As part of our DevOps journey there are several initiatives and platforms with the sole purpose of enabling teams in their path of self-ownership and raising the quality of our products. One of these initiatives is the Monitoring platform, a cross functional virtual team that governs the area of application monitoring.

My name is Oscar Wånggren. I started at If some years ago with the almost unbelievable title of Monitoring evangelist and Monitoring platform responsible. Together with my team of monitoring engineers, site reliability engineers and tool administrators we strive to make complex things easier. We support a simplified adaptation of tools and ways of working to harmonize the observability aspect of If’s application flora. In the organization we go by the name of the MaaS (Monitoring as a Service) team.

Centralization of tools.

We understand that all teams have different problems and needs. Our ambition is standardizing as much as possible to lower barriers between our teams, but still not force our engineers to use technology that they would otherwise not chose to solve their problems.

We believe in and encourage autonomy. That leaves us with the only option to promote a solution that is too good to ignore whilst also keeping in view what is best for the organization, minimizing cost and waste. There is “no one size fits all” solution on the market. Just as with many other companies there exists a mix match of front running cloud first initiatives and business critical legacy applications, all highly dependent on each other. So, we have collected an array of different options for teams to guide them to an observability strategy that fits them and their needs. Evaluation of the tools are done together with teams in the trenches, so we assure that we are solving actual problems.

The Platform focused its effort on Application instrumentation, Log management, Synthetic availability checks, Customer satisfaction KPI’s, observability and alert strategies.

We have learned through experience that with harmony in the tool’s utilization the broader and more holistic is the observable perspective.

What we offer teams is to own the governance aspects of tools to free up time. We are the ones that handle contracts and distribute licenses so the teams can focus on product development.

Coaching, and paved roads.

The bigger parts of the continues implementation of monitoring is done by the teams themselves. Even if we state that we supply Monitoring as a service we can do just a minority of what is needed. The MaaS tries to mitigate this in several ways.

First, coaching and helping lay the foundation of a monitoring/observability strategy. Finding super users or holding lectures and technical demonstrations on frequently asked questions.

Paved roads are in the form of documentation or collections of relevant information that can guide teams without having to do all the research themselves.
This is usually on basic implementation how too and strategies freeing up time to coach teams in more edge case scenarios.

Packaged services

The most frequent implementations and problems that we aid in are target for automation or packaging. These are what we categorize as packaged services. If not fully automated, we still take the workload for implementation or on-boarding of the strategic resource and treat it as a delivery to teams making the request.

Examples of these services are the 24/7 alert process and the Application performance score report. In the 24/7 service, when requested we do all implementations of critical alerts and set up the escalation paths with our service desk and incident managers. The team only supplies the information, and we deliver the rest.

The APS (Application Performance Score) is a public report that uses the AppDex KPI (Key Performance Indicator) to calculate end user satisfaction. The team requesting to onboard needs only to attend implementation workshops where they get a private introduction to the report and guidance to correct any findings prior to going live in the company.

Categories and baselines.

The newest of the platform initiatives has been addressing the felt complexity of observability. Many teams are new to monitoring technology and methodology. Focusing purely on the problem at hand rather than seeing the bigger picture. That is not only technical implementation, but also ways of working.

With Categories and baselines we aim to cut the elephant. Looking holistically at the target applications, its components and business criticality to define a “good enough” maturity level for the team to incrementally reach.

By framing it as maturity levels we can narrow the most immediate scope down to something consumable and start incrementally implement and adopt.

The process is easy. We start with defining the criticality of the components and set the expected maturity level or Category.
All categories have defined implementation steps ranging from tools selection and instrumentation to full utilization, public reporting, observability, and alert strategy.
The categories act as milestones for teams to be able to translate it to a roadmap of actual user stories.

The art of being an Enabler.

To work with tasks purely related to aiding others is a tremendous privilege. It is rewarding to constantly get the opportunity to be the savior and the just in time expert. But getting to that stage is not as easy as it might seem. Knowledge and technical depths alone are not always sufficient.
Being an Enabler means that you need to make complex things easy, but you always risk doing the opposite. It is also easy to fall into the trap of creating more processes, bureaucracy, and obstacles with every implemented principle.

Quality and productivity are not the easiest mix. You must recognize the uniqueness of every engineering team, focusing on large problems and be able to prioritize deliveries that saves time for teams to focus on their value delivery.

Most important is to know and trust the teams. In the end you won’t be able to help everyone, and you can’t let anyone become dependent on you to succeed. The focus should always be to teach your audience to help themselves.

--

--