finops-an introduction

Samrat Kar
building high performance software systems
8 min readMay 26, 2024

What is finops?

Finops is a set of processes and practices followed to arrive at a data-driven spending and usage decision for cloud resources.

1. The language of finops

i. cost allocation

split the overall cloud cost and associate them with granular cost centers who are actually causing the expense. These entities might be scrum teams, product teams, project teams etc.

ii. wasted usage

cloud resources that are procured but not actively used. The cloud service provider will still charge for the resources even if it is not used.

iii. right sizing

ensuring that the size of the provisioned resources are neither under-sized (product quality issue) or over-sized (wasted resource)

iv. workload management

running resources in cloud only at times when they are needed.

iv. on-demand rate

base rate for a cloud resource when bought directly.

v. rate reduction

Reducing the rate of the resources by procuring them not on the “on-demand” pricing model, but using the following other options- saving plans (sp), reserved instance (ri), committed use discounts (cuds), flexible cuds, byol (bring your own license) , or commercial agreements between an organization and a cloud service provider to receive a lower rate from resources used.

vi. cost avoidance

reducing the resource usage, either by removing a resource altogether, or moving it to a different cheaper tier, or right sizing it, thereby reducing the cloud cost.

vi. coverable usage

when resource charge is discounted by reservation, it is called covered. what is important is, for a set of resources to provide enough consistent usage for a commitment to be utilized, to save money, not whether a single resource runs consistently.

vi. unblended rates (aws specific)

some resources are charged in decreasing rates the more they are used (usage discount / volume discounts). So, the unit cost of those resources that are utilized more will be lesser than that of the resources that are not used that much, even if they are identical in shape and size. when the rates are presented in this way, they are called unblended rates.

vi. blended rates (aws specific)

in the billing data, aws provided a blended rate. this standardizes the rate that is paid for same types of resources by evenly distributing the charges to each resource, except when the usage is “covered” by commitment-based discounts.

vi. amortized cost

some cloud resources and commitment based discounts can be purchased with an up-front fee. the amortized cost of a resource takes this initial payment into account and divides it out, attributing the prorated cost for each hour of actual billing. this is known as “show-back”. showing the expense at the time of invoicing is known as “charge-back”. this entire scheme of prorating the expense over an extended period of time based on actual billing of the corresponding resources is known as amortized cost.

vii. fully loaded costs

fully loaded costs are amortized, fully allocated (mapping done to who is driving the cost).

vii. commitment based discounts

by pre-committing to a cloud service provider a set amount of resource usage using SPs, RIs, CUDs, there are discounts applied to the resource cost. these are known as commitment based discounts.

2. The anatomy of cloud bill

i. invoice

high level overall cost of the cloud usage. this is not sufficient for finops as it is very much summarized and does not provide enough data granularity or freshness. this is useful for finance for general accounting purpose. it is useless for finops.

ii. cloud native cost tools

all major cloud providers offer solid native cost tools to get one started analyzing spend to a high level. but they have limitations such as handling multiple cloud bills, handling custom negotiated rates, container cost allocation, etc

iii. cloud billing complexity

in 2021 aws had over 791,000 individual SKUs. It keeps increasing every month. There are times when one detailed bill might have billions of individual charges each month! detailed billing data typically comes in through multiple updates each day, with a complex interconnections of charges behind spending, such as instance hours, gigabytes of storage, data transfer, etc.
although the tools are available to see the patterns and slices and dices of these cost allocations, it is imperative to have expert finops practitioners in the teams, who has a deep understanding of the billing data, and can decipher the complexity. Billing experts have been known to identify discrepancies in billing data and report them back to their cloud service providers for billing adjustments and/or refunds!

3. 6 principles

i. Teams need to collaborate

Going beyond finger-pointing and shaming, to a culture of collaboration across teams. This brings focus on breaking down the silos — engineering, finance, product mgmt, executive and the rest of it.

The product managers fine-tune their application scaling forecasts to accommodate expected income from new features.
The finance team uses language and reporting that moves at the speed and granularity of the cloud.
Engineering teams consider cost as a new efficiency metric.

At the same time, the finops team works to continuously improve agreed upon metrics for efficiency. They help define governance and parameters for cloud usage that provide some control but focus first on ensuring innovation and speed of delivery can flourish alongside cost efficiency.

The cost efficiency becomes a shared responsibility, and every team and team members are held accountable for their own resource usage and its ROI driving business value.

ii. Decisions are driven by the business value of cloud

Cloud is not a cost center. The cloud is a value creator. Instead of focussing on the overall cost per month, focus is on the cost per business metric (unit cost) and always make decisions with the business value in sight.

iii. Everyone takes ownership of their cloud usage

The overall cloud cost is allocated to the most granular level to that of individual engineers and teams. And help them with added information and guidance to better manage the cloud resources. Cloud costs are based on cloud use, which comes with a straightforward correlation — if you are using the cloud, you are incurring costs and thus are accountable for cloud spending.

iv. Finops reports should be accessible and timely

In the world of per-second — or even micro-second — compute resources, unlimited cloud storage, shared Kubernetes clusters, automated deployments, and services that can incur costs based on eternally controlled triggers, monthly or quarterly reports of cloud spending is not good enough. Real-time decision-making is about getting data — such as spend changes or anomaly alerts — quickly to the people who deploy and manage cloud resources.

Finops decisions should be based on fully loaded and properly allocated costs. The cost should be amortized to include any prepayments made as part of commitment programs and should reflect the actual discounted rates a company is paying for cloud resources. They should also equitably factor in shared costs and be mapped to the business org structure.

iv. A centralized finops governance team

To implement finops practices, it is imperative that the entire organization takes responsibility of efficient cloud usage, from the executive to the lowest level engineer. However, it is also of an immense importance to have a centralized finops governance team. This team constitutes the SMEs who typically does the following tasks continuously —
improve available data via better tooling, modify business processes to enable better finops, maximize rate optimization by centralizing the usage, enable and guide the teams for usage optimization, internal benchmarking to compare teams to each other in key areas such as optimization, and external benchmarking based on industry standards to compare the company as a whole to other like it.

The idea is to de-centralize responsibility to use less and centralize responsibility to pay less.

v. Take advantage of the variable cost model of the cloud

The Reserved Instances (RI), Committed usage discounts (CUD), Savings Plans (SP), models such as spot-instances that can buy low-cost resources when needed, using cloud-native services that can scale with demand, etc make the cost of the cloud highly variable, and dependent on the creative ways to manage the cost rate and usage.

4. The finops lifecycle

The finops lifecycle is an iterative circular approach where the teams keep continuously improve to dot he following —

i. Inform

Visibility for allocation of cost to the level of engineers and teams, and helping the practitioners take data drive decisions to optimize cost and continue to enhance business value. This phase enables individuals who can now see the impact of their actions on the bill.

Map spending data to businesses — the teams who are actually incurring the costs, is of prime importance of the inform phase. This is done by resoruce-tagging.

Create showback and other reporting techniques to the edges of the organization to drive right behaviors at all levels.

Defining budgets and forecasts — The finops team should provide the data needed for team to generate forecastsof cloud usage from different projects and propose budgets for each. These budgets and forecasts should consider all aspects of cloud architecture, including cloud-native services, containers, and related costs.

create score cards, compute dynamically the custom rates and amortization, analyze trending and variabce, benchmarking internally and externally, indentify anomalies are few important goals for inform phase.

ii. Optimize

This phase empowers teams to identify and measure efficiency optimizations, like right-sizing, storage access frequency, or improving RI coverage, etc.

Analyze the KPIs and set goals is something that is key in this phase. Finding and reporting under-utlized resources, and evaluate centralized commitment based discounts options are key.

iii. Operate

This phase defines and implements processes that achieve the goals of technology, finance, and business.

Whereas the optimization phase sets the goals for improvement, the operating phase sets up the processes for taking action to achieve those goals. This phase also stresses the continuous improvement of processes. This includes delivering spend data to stakeholders, making culture changes to align with goals, right-sizing instances, and services, define governance and controls for cloud usage, continuously improving efficiency and innovation, automating resource optimization, integrating recommendations into the workflow, and integrate chargeback into internal systems.

4. Cost allocation

Cost allocation is the process of tagging the overall cloud cost to the granular level of teams and individuals who are using the cloud resources, incurring that cost. This facilitates the right accountability in each level, and ease of action taking in real-time.

Amortization is an important mechanism that enables better cost allocation based on the usage of the resources instead of when the cost was invoiced. This is known as Show-back instead of charge-back.

5. Usage optimization

Usage optimization is brought about by right-sizing. Right-sizing cannot be intuited just by mean or spike. There has to be a performance model that would forecast the usage pattern of the resources.

Moving and removing resources is of prime importance in usage optimization. Sometimes moving the data to the right storage tier helps reduce the cost. When data is not actively retrieved it can be moved to “cold storage” where it is not actively retrieved. This tier of storage is significantly of lower cost. Similarly removing orphan resources are also of immense importance when not in use.

Spot instances vs on-demand services — Spot instances are the resources that are currently not in use, and are available to be booked by anyone in much lesser price. On-demand is always available on demand. But spot instances might not be available if it is already being booked by an on-demand service.

Going beyond compute and seeing potential cost savings in databases is of key importance. The RDS (relational database services), managed SQL, azure managed disks, cloud SQL, and storage such as Elastic Block storage EBS, are some key pockets of potential cost savings.

Identifying resource “shape” along with “size” is of key importance.

--

--