Platform Engineering KPIs
Platform as a product is becoming an increasingly popular approach to building internal platforms in engineering organisations. While software-driven companies are competing for market share, there is another, more subtle competition on the rise: who can enable their engineers to ship new features fastest; who has the most effective internal platform?
In this post, we will share our approach to building out KPI trees for our platform engineering teams at Wise (formerly TransferWise). Starting with the product development process, we will explain how we shaped our platform vision, resulting in a set of actionable KPIs that we use to identify our biggest problems and continuously measure our platform’s performance.
Product Development Process
The purpose of a KPI tree is to help us implement a product development process that is based on hypotheses-driven experimentation and validated learning. While there is extensive literature on implementing a product development process, the more difficult part is to identify the metrics that apply to the platform engineering domain.
As depicted in the diagram above, the product development process begins with a platform vision and model. These are the constant parts that should rarely change. A model is a representation of metrics or levers and their relation to each other. A KPI tree is the type of model representation that we are choosing for this exercise. Let’s start with defining our platform vision which ultimately informs relevant metrics or KPIs that we think platform can influence and is accountable for.
The platform vision constitutes our highest-level goals and without them, we don’t know what we are measuring ourselves against. Especially with Wise’s autonomous team culture, a vision is critical for creating alignment and accountability. Within our product management team, we discussed extensively whether our company vision can act as our platform vision.
Money without borders — instant, convenient, transparent and eventually free
Although the Wise vision (or mission) is what ultimately motivates us, we came to the conclusion that it doesn’t serve us well as a platform vision. Our platform contributions bring Wise closer to achieving its vision, but the connection between e.g. convenience and our platform engineering work is not obvious. Hence why, we decided to define a more relatable vision.
Provide foundations that underpin Wise’s stability, enable teams to ship with confidence, faster and more efficiently than everyone else
Although this vision is not specific to Wise, it fulfils our most important requirements: it is ambitious, motivating and it reveals our levers for success. Every team and squad should be able to identify with this vision and it should be clear to platform engineers how their daily work contributes to it. Based on the levers, we are able to derive North Star KPIs that will serve as the roots of our KPI trees.
Platform KPI Trees
As shown below, we added one additional North Star KPI called Risk, that can’t be derived directly from the vision. Risk constitutes an invisible constraint, meaning that productivity, stability and efficiency have to be achieved while staying inside Wise’s risk appetite.
Based on the KPI tree roots, we can now start to derive the model. If our goal is to make Wise more stable, we need to understand what our levers are to improve Wise’s reliability. For creating those models, we relied heavily on existing frameworks and research around developer productivity and SRE. Following, we are listing the most important ones.
- Accelerate Book and Four Key Metrics
- SPACE Framework for Developer Productivity
- Engineering Effectiveness Handbook
- Google SRE Books
While the readings above are a good starter, there aren’t very comprehensive examples available for the platform engineering domain. We spent quite some time brainstorming and developing models ourselves. By sharing our approach, hopefully we can help speed up this process for other platform teams.
- KPI trees are models and are inherently imperfect. There are more formal ways of creating KPI trees where the branches are inputs to a function of the parent. For us, it is sufficient if a metric has a significant impact on its parent to be considered a branch.
- Good metrics should be actionable, have reproducible results, and represent reality accurately. Several of our platform engineering KPIs have shared responsibility between our product engineering teams and the platform. Therefore, platform sometimes can’t intentionally reproduce the results.
- We are only sharing a subset of KPI trees, and the ones we are sharing are incomplete. They do provide enough information however to convey our approach and help you get started with this exercise.
Note that KPI trees don’t replace thorough user research. Metrics will help you identify areas worth investigating, enabling targeted and more efficient user research. However, you will still need to invest time in interviewing your customers to complement the insights you gained through metrics.
Following, we will present the KPI trees for our platform engineering North Star KPIs: productivity, stability, efficiency and risk.
Developer productivity is a controversial topic and it is important that metrics are not misused to measure individual performance. The Productivity KPI tree has three branches which are split into separate sections for easier visualisation: Lead Time, Deployment Frequency and Developer Happiness.
Lead time is what we consider the time between a code change and the release of this change to our end customers. This KPI tree mostly measures friction in the CI/CD process.
Simplified, the Deployment Frequency KPI tree comprises metrics that capture friction before a code change is made. For example, before a developer can change a service, they need to read the documentation to learn how it works.
Considering developer satisfaction or happiness as part of productivity can have an unethical flavour. Fortunately, it turns out that developer productivity and developer satisfaction are positively correlated and mutually dependent. Some argue that platform engineering as a domain doesn’t have enough impact on developer happiness since we have no influence on factors like compensation or individual growth opportunities. We think that, although we don’t exclusively own developer happiness as a problem domain, much of our work contributes to it. Due to the strong correlation with productivity, keeping a close eye on relevant metrics is vital.
Delivering changes fast and often is only half the job. The Stability KPI tree measures our ability to enable product engineers to make changes confidently and without breaking the end customer experience. It reflects the overall availability of Wise services and acts as a counterweight to change. Our stability levers include providing a reliable cloud-native platform as well as consulting product teams with guardrails and best practices.
While productivity aims for more output with the same input, efficiency aims for less input while maintaining the same output. In our approach, the Efficiency KPI tree covers cost-related metrics. This is the cost of cloud resources, infrastructure and licenses, as well as the cost of our platform engineering teams.
As a financial service provider, risk management is one of our top priorities. As a platform engineering organisation, we are responsible for the foundations of Wise and have a special responsibility to implement change controls and adequate security measures. We also consider our product teams’ deviation from best practices and the golden path to be part of risk.
KPI Tree Levels
As mentioned in the caveats, we only depicted a subset of KPIs on each level and limited the depth to three to four levels for the purpose of this post. To illustrate how deep a KPI tree can go, below you can see a vertical slice for the example of Lead Time.
Now that we have a good understanding of relevant levers and metrics for the platform domain, we need to implement the tooling that enables us to collect and analyse them. In the next section we will share how we are visualising our KPI trees to make them consumable and actionable.
Platform KPI Dashboards
Many of the KPIs identified from the platform space have a shared responsibility between product and platform teams. We need to keep this is mind, when implementing their visualisation. The wireframe below depicts the Global Engineering view mainly consumed by platform, but also providing filters for the various organisational entities. The Engineering Comparison view is tailored towards individual engineering teams, helping them to benchmark and eventually optimise their performance and health.
The total scope of all KPI trees includes over 200 different metrics, not all of which are measured today. Prioritising by potential impact helps us decide where to invest our time to get additional insights. Wireframes as shown above serve us in two ways: firstly, by setting expectations and providing a basis for discussion with stakeholders outside of platform. Secondly, by creating alignment inside the implementing team and visualising milestones of the KPI dashboards’ roadmap.
Measuring the identified KPI trees from top to down will help you understand areas that need your attention and quantify the impact of platform work. While we have good insights into the first two levels of our KPI trees, the metric coverage down the branches is still patchy. This means that we see KPIs changing but are lacking the underlying data to infer why. To accelerate the coverage of our platform KPIs, we need to make it much easier for platform and product teams to ingest their data and make it actionable.
Developer Productivity Engineering is still a very new approach for many engineering organisations. Big tech companies have set the bar when it comes to collecting and analysing data from developer behaviour and developer tools. This shows us that we are still at the very beginning of our journey.
Benchmarking Wise’s product and market share is much easier than benchmarking our internal platform. This is why we think it is important to share our approach and hope for an exchange with other platform teams in the future. Feel free to reach out with any questions or feedback.