Why do you need a Platform Engineering team

Published in

CodeX

6 min readJun 21, 2021

Platform engineering teams are teams solely focused on improving the lives of their customers: the product engineering teams. If you don’t think that your engineering department has a single platform engineering team, odds are you have already have one — albeit a hidden one, built from individuals across your existing software engineering teams. Is there a particular person, or group of people that everyone goes to when there is a build issue? Do you all know that one person you reach out to for help when there’s an infrastructure problem? These individuals are performing the functions of platform engineers, but they’re not able to focus on that work. Instead, they’re only able to slowly chip away at platform problems and aren’t ever able to make the deep progress that is needed to unblock all your other engineering teams.

So what even is platform engineering? Before we go any deeper, I’ll acknowledge that everyone has a slightly different view of what platform engineering is. I’m going to provide my perspective on what I’ve seen work well when building and growing platform engineering teams. For me, platform engineering teams are those that work on the common platform that supports all of your other teams. A good test for this is If your customers are other engineers, you're on a platform engineering team .

Platform != Infrastructure

People are often surprised when I make this point. It’s not that these two aren’t related, your platform team should be the ones doing research, prototyping, stress testing and releasing good infrastructure primitives to your product engineering teams. It’s that this isn’t the only thing that they should be doing: front-end engineering and mobile engineering are also core components of platform engineering. Many engineers often have difficulty seeing this until you point out that you use a common framework or system (a.k.a. platform) across your product engineering teams.

A good example of a front-end platform engineering challenge is a component library. Let’s set the stage for this by imagining that your engineering department has four product engineering teams and two platform engineering teams.

Scenario A: no front-end engineering in the platform teams or no platform team

Let’s imagine you’re building your component library according to atomic design. Your design team has built seven atoms, seven molecules and six organisms for engineering to implement. As many other companies without a platform engineering function, you decide to build your component library iteratively — taking on new components over the course of new projects, spreading the load across the teams.

Each component is completed over the course of a single two week sprint, so it will take 20 sprints to complete the library. If we maximize execution, and have head product engineering team take one component per sprint, it will take us five sprints to complete the component library. If you think this is long, wait for the next part.

A component developed in isolation of the other product engineering teams is unlikely to be feature complete for all teams and has a separate probability for causing UI regressions when used in a new context. We can assign both of these probabilities — let’s go with a 25% chance of not being feature complete and a 50% chance of inducing a UI regression in another product engineering teams view. Begin to factor in how long resolving any of these issues will take, and you’re looking at, at a minimum, one more sprint of work per team. This would mean that developing our component library has taken up time during 24 sprints, pushing out other work that the product engineering teams could have been working on.

Scenario B: platform team handles its share of front-end challenges

Now let’s imagine a different scenario, one where we have one of our two platform engineering teams focus on building our component library, distributing it and training our other teams to use it.

If each team (product and platform, both) have four team members and each component takes three engineering days to complete development, then we can complete 12 components per sprint. This means that our platform engineering team will complete development of the component library in two sprints.

While we then need to invest in training the product engineering teams to use the component library (a topic for another time), we’ll be able to roll out our component library to our product after one month instead of six. Additionally, by having a singular team that has each of the four product engineering teams as their customers, we’re far less likely to miss any edge-cases in utilization of our component library.

Developer experience

Similarly, few teams focus on developer experience, including, but not limited to:

Local development
Quality, testing and performance
Observability
Security
Build systems

All of these areas, if supported well through developer tooling and support systems, can drastically increase the speed of high quality development. Without delving too deeply, because each area deserves its own deep-dive, I’ll ask a few questions. If you can answer all of these, then you’ve not hit the point where you need focused platform engineering teams. Otherwise, it might be time to start identifying what areas are slowing down your product engineering teams.

How long does it take to setup a developer environment for a new engineer?
How often does the local development environment break or become inconsistent?
How easy is it to develop high fidelity test data? Or does everyone make it themselves manually?
How quickly can you debug a customer issue? Are there specific types of issues that make your team groan with frustration?
Can you quickly identify how many customers are impacted by a given issue?
Do you have well known, internally publicized SLOs for all services and subsystems? For any SLO breach, is it easy for any engineer to quickly diagnose the problem?
Do you know which services will break next? When you have 10x your current customer base? Better yet, do you have plans to address these future issues?
Do you know which customers have the worst experience with your product today?
Does your team have to think about how to implement code up to your encryption standards (e.g. do you have an internal abstraction layer that wraps known standard crypto primitives, NOT rolling your own crypto)?
Do you have security scanning and testing systems (vuln scanning, HIDS, SAST, DAST, secret scanning, etc)?
Do you have dynamic fuzzing or other automated security testing tools?
Do developers complain about how long builds for certain systems take?
Do builds for certain systems fail randomly?
Are your deploys all-or-nothing or do you have blue/green deploys?

This list could be much longer, but it captures a few of the major items for each of the previously listed sections.

Working on solutions for any of the above problems is often difficult to execute on without a platform team, because you encounter the same knowledge sharing and ownership problems that we encountered in the front-end engineering example. If you have a platform engineering team focused on these problems, though, as with the front-end engineering example, you have engineers specifically owning implementation of far reaching solutions that will super-charge your product engineering teams.

So what does this look like in practice?

In practice, this starts with a list — what are the major issues slowing your teams down today? With that prioritized list in hand, you’ll want to spin off a single team dedicated to executing on the items in that list. One thing you don’t want to miss here, is that you do need someone acting in a product manager style role. They’ll need to do user interviews, be aware of the product road map and doing user-interviews with other engineers to ensure that what you’re building does solve their problems — you’re building solutions, not science projects.

After that single team has been executing for a while, and doing it well, if their backlog has been growing or there are other initiatives that would unlock your product engineering teams, then it’s time to scale horizontally. This is best done by identifying which are the primary focus areas for your platform teams — is it infra and front-end or is it reliability and developer tooling? With those focus areas in mind, grow and split your teams accordingly, never starting with fewer than three engineers on the new seed teams. Why three? Because three allows a group to build a strong sense of identity and camaraderie. On a team of two, as soon as someone takes a week long vacation, everything is on the other engineer’s back.

Wrapping up

This is only the very beginning to your team’s platform engineering journey. I’m hoping to share more specific playbooks of mine for building such teams in the future (yes, there will be metrics) and techniques for showing departments outside engineering why platform engineering is an essential investment. In the meantime, hopefully this was helpful to someone!