FinOps Engineering — Visibility — Trends & Emergencies

FinOps Engineering — Visibility — Trends & Emergencies

Nick Gibbon
Pareture
Published in
3 min readMay 15, 2024

--

Data visibility is a core element of FinOps. It is 1 of the 6 principles. FinOps data should be accessible and timely.

“If you can’t measure it, you can’t manage it!” is the adage. Whilst things may be a bit more nuanced than that we can all agree that knowing how much money we are spending is a good thing.

Big IT projects with lots of people involved can be quite amorphous and difficult to really understand what is going on. FinOps visibility provides an important objective element to consider.

Visibility at different levels of granularity allows us to understand what is being spent where and what the current trend will mean for the future allowing for planning, decisions and actions.

Overarching visibility lets us see exactly what the cloud spend is and what direction it’s going in at a high level. Practically this data enables FinOps professionals to negotiate commitment-based discounts. But more importantly it lets us consider if we are getting commensurate value? This can be difficult to know without further break down.

Visibility at the team / product level is incredibly useful. It lets us know what proportion of resources each initiative is really taking. Is there any misalignment with expectations? How is each individual initiative trending and is it sustainable? Over time in this environment there should be no surprises. Each leaders cost story should align with their products lifecycle and goals. Here is where unit economics becomes an important concept.

FinOps professionals can use granular visibility for chargeback which helps with accounting and accountability. And they can use it to identify opportunities for optimisation and collaboration with teams.

Visibility concretely shows when FinOps interventions and changes have succeeded and this can be very satisfying and motivating for everyone involved.

Members of the individual teams can use visibility to gain greater context in to what their work is doing and they can help identify and action anything anomalous. They can look at their unique service split and understand what is costly and this can inform what they might worry about more. For example, through visibility I found that getting metric data out of the cloud is very expensive and this led to us deprecating a certain feature in an internal product and only performing this activity when it was truly justified.

They can also look at trends. If they have not grown in some way (scope, users, devs, quality) then their cost probably should not be increasing and even if they are growing they need to manage it consciously and ensure the scaling is right. For example, even in a stable team who have implemented many cost optimisations our dev infrastructure cost will creep up over time and require some manual housekeeping.

Emergencies

Cloud consumption can be unforgiving. It is important to react quickly to unexpected cost events at large organisations due to the sheer amount of cash that can be wasted but it is even more important for individuals and small organisations where the impact would be felt the most. This is an important risk to manage.

There are many different tools that can help but the goal is to have close-to-real-time cost data visibility and to alert where budgetary thresholds have been breached over hourly, daily & weekly time periods. Automated anomoly detection would also be useful to experiment with. Here FinOps crosses over with security and incident management and you will want a plan for the steps you might need to take and the support you can muster.

Ensure you test your alert and response flow by inputting lower cost thresholds to trigger. Imagine how unhappy you would be to have found a misconfiguration or attack lost you $1,000 in a few hours before you could remedy it. Now imagine if you have no visibility or alerting and you open the e-mail with your cloud bill at the end of the month and it’s $100,000.

Be safe out there everyone!

--

--

Nick Gibbon
Pareture

Software reliability engineer & manager in cloud infrastructure, platforms & tools.