Scaling Engineering Teams

I recently gave a talk on “Scaling the Team and Tech” at Blue Run Venture’s CTO Summit. I’m posting it publicly to start a larger conversation.

Have you ever watched your best engineer turn into your weakest link? Have you seen key team members get constantly sidetracked by critical but disruptive tasks? Have you ever worked with a team that’s kicking ass only to hear from other departments that the team’s ability to get stuff done is being questioned?

Every engineering manager has felt some of these. For the conference, I was asked to explicitly address questions ranging from performance tuning to effective management practices at scale, balancing quality vs expedience, and managing team growth.

What all of these questions have in common is simple. You’re overwhelmed because everything appears to be on fire, or you’re worried everything might be on fire, or you’re being told while pointing at the fire that “that’s not fire.” In short, someone in the company is in this situation:

It’s not fine. But it might not be fire, either.

And you want to put that fire out so your team can execute.

There’s good — even great — news here, and the goal of this post is to help you understand how to locate and manage the sources of these fires.

Without getting into customer experience yet, this is my definition of engineering excellence. If you

  • hire, train, and mentor your staff well
  • foster open and honest conversations
  • provide tools and clear measurements of success
  • constantly and consistently set expectations

your team will succeed.

At this point, you’re almost certainly saying “We’re doing all this, so clearly you’re wrong.”

There aren’t easy solutions.

Here’s the truth about scaling engineering teams: there are no simple answers. I’ve watched hundreds of teams solve scaling problems in completely opposite ways. Scaling solutions are the opposite of one size fits all.

So what the hell am I here talking about if I can’t solve your scaling problems for you? If I can’t give you specific answers to all your questions?

Our goal today is to take knowledge you already have about scaling and package it in such a way that it becomes actionable. My hope is to give you frameworks for

  • talking about scaling issues.
  • isolating the hard problems.
  • having meaningful conversations around scale.

So let’s look back at that fire and our definition of engineering excellence.

All our questions — all our fears about fires raging — are actually symptoms of inflection points that were missed (generally for the best of reasons!) during the growth of the team. And the problem most managers have is that they cannot consistently and constantly set expectations around scale and team growth if they don’t understand what this growth looks like and how operations change over time.

That’s right: the cause of our issues are communication problems around team inflection points, not haphazard fires that need to be doused. So let’s talk about…

TEAM INFLECTION POINTS

Here’s the thing about the first phase of a team: people know what’s going on through osmosis. There’s no policy, there’s no procedure, there’s no day-to-day management. There’s a north star and people operating in different swim lanes doing their best to figure out what it takes to get there. But everyone is on the same team with the same metric for success.

At this phase, the most important thing engineers can do to forward the business is just ship code, learn, course correct, and repeat.

This is the “making mistakes” phase. And it’s where you’ll ship technical debt as product because that is a necessary feature of moving fast while not quite knowing where you’re going or how to get there yet.

Technically, you’re most likely to have a single app with a single datastore.

The people who thrive in Phase 1 are prototypers and other entrepreneurial types who need very little direction or structure and are comfortable breaking things.

Since this phase is almost always pre-product-market fit, there’s very little downside to getting things wrong, especially if you do it fast and learn from it. But also, because it’s pre-product-market fit, there is no definition of quality yet. If you don’t know who the customer is or what their problems really are, what is quality? How do you define it? This is where more senior engineers trip themselves up, because they want to build quality code now before that actually has a meaning. We’ll come back to this later.

And finally, this is the phase where you hire for culture fit. When everyone’s in a boat together, you want people who agree that it’s a good boat, that it’s pointed in the right direction, and won’t ask pesky questions like “shouldn’t we plug that leak right now instead of paddling so hard?”

If you haven’t heard these I’m interested in moving into your cave.

Now it gets ugly. I bet you’ve walked face first into some of the above questions before, or overhead them being discussed.

What’s happened? Where did these disruptive questions come from? What’s going on?

That’s right — your team is no longer one team. Some people are now heads down in other worlds and not picking up the day-to-day through osmosis any longer.

The red circles on the slide represent people moving off the original team into more isolated realms.

You now need more formal lines of communication — daily standups, weekly status updates, weekly team debriefing, whatever is right for your org.

But the point is, these questions are getting asked because people no longer have the information they used to (because of healthy growth and becoming more focused!), and now it’s your job to push that knowledge to them at the right time in a form they’ll consume.

Now the engineering team’s starting to coalesce, and the business has started to more fully understand who its customers are and how to serve them.

The company is now more focused on product and specific features to fill customer needs. You probably still have a single backend data store, but you might now have an app or two interfacing with it and maybe a cache or search service to get around inefficient queries you never expected to need when you first designed your data structures and schemas.

And with a clear customer, now you can define quality. But mistakes start having consequences that the engineering team is not the first to hear about. Thankfully, you can finally start to tackle technical debt that creates blockers against your roadmap.

You’ll quickly notice that a very different type of engineer starts to thrive now. Those that can think from the customer’s point of view, those that can communicate with others in your organization and without, they start to add more value than the raw prototypers (who still have a vast playground of new features).

But the critical function at this point of your most valuable engineers is knowing the line between when to move fast and break things and when to go slow and get them right. This isn’t the strength of a prototyper, nor should it be.

Now is when you start hiring for culture add. Different points of view, different experiences, different professional backgrounds, all of these make your team stronger now and fill strategic blind spots that you willing accepted to move fast in phase 1.

As you start to bust out of phase 2, you’ll start to hear the phrases above. I’m certain you’ve heard some — if not all — of these.

So, why are you hearing them?

As you’re realizing by now, communication is breaking down again. Why? Because phase 2 started with a single team (engineering!) and scattered individual contributors in specific roles. So you built communication between individuals and your team.

But now other teams are forming. The red arrows in this slide show individual contributors grouping together naturally. A growth marketer and a content marketer have joined up to form a marketing department and are about to hire someone for lead gen. The CEO is underwater with customer requests so hires a few account executives. Whatever the reason is — at the outset of phase 2 you had a team tasked for pushing information to individuals, and now you need to relearn to communicate between your team and other teams.

For instance, you probably have one person on your team who built and really understands your core data system. Let’s call her Mary. Data issues probably flowed to her naturally, and as you entered Phase 2 you almost certainly told people to just speak to Mary as needed. This was good and efficient. But now it’s not a single person sending infrequent and specific tasks, it’s whole teams of people and Mary can’t juggle the inbound requests and her actual day-to-day commitments.

So you need to rebuild this process. Maybe it’s a bug board that you rotate engineers through. Maybe it’s training other engineers on how to fix data issues so you can spread the load. Or maybe it’s building tooling and automation so other teams can fix issues themselves. Regardless, new process is needed at this point because what was fine and good at the beginning of this phase doesn’t scale into the next phase.

Another tricky situation that can happen at this point is represented by the red dot on the slide — a new engineering manager or technical lead joining the team from the outside. This is good and healthy, but the tricky bit is that this type of person tends to join just as the phase transition begins, so everything they learn about how to communicate on day 1 will be different from how they need to communicate a short while later.

But the biggest problem with the transition from Phase 2 to Phase 3 is one of trust. Up until this point, trust in the organization was between the engineering team, or specific individuals in engineering like Mary, and individual contributors. Trust like this does not scale to teams.

And that’s why we get the concerns voiced in the slides — people trusted individuals, but are unable to transfer that trust to the team as a whole.

Trust for other teams must be rebuilt from scratch by building trust in the processes of those teams. “I trust Sally in Sales” is fundamentally different than “I trust Sales” or “I believe the entire Sales team is doing what I believe to be right.” Where you need to end up is “We trust how Sales makes decisions because we make decisions the same way” or “We trust how Sales operates because we understand their process and have visibility into it.”

At Phase 3 you’ve moved from a collection of individuals to a collection of teams. Communication has become a push and pull of information between them. Now you’re firing on multiple cylinders, might have a suite of products with competing KPIs and differing incentives for different teams, and your technology is starting to get complex as you likely now have a suite of apps and data.

Now that there are actually teams interacting with other teams, those who thrived in Phases 1 and 2 start getting replaced as key players by more methodical engineers. This is where engineering velocity starts to feel like it’s slowing down because you’re trading speed for quality — but what you gain is long term stability and far fewer hours and sleepless nights spent fighting fires.

And you start to hear a new phrase that was once a warning sign of lack of culture fit — “that’s not my job.” This used to be the opposite of what you wanted to hear from anyone, but now it means “I’ve got my swim lane, and you’ve asked for something outside it.” This means you’ve hired people with focus who are actually focusing. This is good. But hearing it the first time can be quite the jolt.

This is also the phase where prototypers start to suffer. Everything they love doing now can cause concerns amongst the more stability minded engineers. There’s still room for a few prototypers, especially in new product or feature development or moonshot side projects. But if you had 5–10 by the end of Phase 2, now you probably only need 2 or 3. There’s a morale landmine here if their careers are not handled carefully.

And finally, this is where you start hiring for domain expertise. As your customers’ needs become much clearer and your technology specializes, you need to bring in staff with a deep understanding of the technical domain at play, and cultural concerns take a back seat except as a last pass if you’re having trouble choosing between otherwise equally qualified candidates (hint: break the tie on culture add, not culture fit).

On the flip side, at this phase it also becomes easier to hire specialists and consultants for one-off projects instead of bringing on a full-time employee. Sometimes it’s far cheaper to pay a consultant for one specific project and have them train your entire team in the process.

Problems abound as you start to push the boundaries of phase 3. I don’t know any larger company that doesn’t have at least some of these concerns echoed from time to time.

So why do these concerns come up now, as we start to break out of Phase 3?

Because all your teams were part of one group, but now you have multiple groups emerging. Within a group the teams may not need to relearn to communicate with one another, but outside the group we go through another phase transition where the groups need to relearn how to communicate with other groups and individual teams.

The red arrows and dotted lines in this slide show teams expanding, teams being broken up into multiple teams, individual contributors wanting to try their hands on different teams (like an early engineer wanting to take a stab as a product manager), and new managers joining or being promoted.

As these moves happen, communication must be relearned and rebuilt amongst everyone.

This is also the point where technical leads who are the de facto managers of engineering teams start to have problems. At this point, communication — not technical excellence — is the critical value of an engineering manager. And as organizational complexity rises, the leader who can communicate between groups, champion her teams’ needs and clear their roadblocks, is the leader set up for success. Most technical leads chose a lead role over a manager role because they cared more about the tech, but at this phase that gets in their way unless paired with a strong engineering manager.

This is also where many companies make a critical mistake for the above reason — they try to hire engineering managers who are also individual contributors because “the team won’t respect someone who’s not shipping code.” Through Phase 3, a technical lead who also sort of manages is fine because the most important output of the team is code. But now the team needs someone who will protect them from all the wasteful efforts that concerns like “we can’t ship on time” cause. To thrive, the team needs someone focused on communicating to others externally and improving process internally (hey, maybe you really do need to rebuild your product development process and build out a product management group — but that should be done deliberately, not reactively). Engineering respect now needs to come from successful championing of the team within the company, not code shipped.

We’ll only spend a moment talking about Phase 4. Simply put, as workers are no longer in the same office or there are more people than is easy to keep in touch with, additional degrees of communication are needed to make sure everyone’s on the same page and doesn’t feel isolated.

Stack Overflow manages the best remote-first culture I’ve ever seen. And they do it with the policy of “if one person’s remote — everyone’s remote.” This simply means if you’re in a video chat with one remote person, you never get a conference room full of people chatting with that one person; everyone joins the chat individually which creates the feeling of a level playing field.

This is also the phase where you start to see a more robust executive team coming about, and some software engineers really focusing on systems architecture in a way that’s not been needed before.

There are many other phases an organization can transition through that cause communications to need to be rebuilt. Also, some phases, like when a new group is built out under a general manager, might start going through the phases from scratch. At this point why these phase changes happen and what they could look like should be easy enough to walk through on your own.

Organizations ultimately take on many different shapes and sizes, but they all evolve as solutions to communication issues.

By now, you’re probably thinking:

That was a lot of information. But there’s a really really hard part to all of this. How do we actually manage these inflection points?

Firstly, you need to recognize that you’re in an inflection point. I am not aware of any consistent indicators — team size, funding amount, revenue, etc. — that correlate to inflection points except the warning signs we worked through.

And as a corollary, if you’re not seeing the warning signs, you probably don’t need to worry about inflection points! This is shockingly important. The warning signs show when you’ve failed to manage an inflection point and rebuild communication. But if you’re not seeing warning signs, your team probably fixed the communication on their own whether they were aware of it or not.

Secondly, managing an inflection point does not mean accepting that you’ll be moving from one phase to another. It could mean intentionally holding your company at one phase because you prefer how you operate there. And as long as you understand the tradeoffs this creates and do it deliberately, there’s nothing wrong with this!

But managing inflection points and rebuilding communication is hard. Why?

Let me answer you by inviting you to hold your breath for 30 seconds:

Hold your breath. I said hold it!

You probably didn’t even try. Right? Or did it for a few seconds and then moved on? Yeah, that’s what I thought.

But here’s the thing: not holding your breath was normal and human. Protecting the status quo is normal and human. To your team, the way it’s communicating now is the status quo, even if it’s starting to get frustrating.

Telling your team “change everything about how you work and communicate now” is the same as me asking you to “hold your breath a little longer.” It’s not about right and wrong. It’s not about obstinance towards what’s best for the team or the business. It’s not about resisting authority. It’s about human nature.

Managing inflection points is about getting people to do unnatural things.

So I’m going to give you a tool that I find works surprisingly well with engineers. Talk about the space shuttle.

To get a rocket ship from Earth to Mars requires a fundamentally different engine and source of fuel at different stages. What you need to get from Earth to Low Earth Orbit is very different than what you need to get from Low Earth Orbit to Mars. (And yes, you can dive even deeper here into changing orbits, landing on Mars, etc. And yes, I realize the shuttle doesn’t go to Mars. And I also realize the shuttle program is over. Sue me.)

The point is, how things were done before was good and right and necessary. But now, things have changed, and a new engine and source of fuel is needed. Not to get rid of what you built, but so we can build what’s next.

Using the framework of the Phase changes we’ve gone through, you should be able to have conversations with your team around how the organization is changing and how communication needs to change as a result. Now you can have the hard conversations with your team around changing direction.

The key thing here is that you should always be aware of who thrives in your current and next phases of growth, so that you can hire people into situations where they’re set up to succeed well into the future. The obvious problem with this is in Phase 1 when you hire prototypers because you can’t guarantee there will ever be a future phase.

But if you treat your people well, manage their careers (hey, maybe some of those prototypers would make great product managers or UX researchers), and set expectations for growth, you can set people up for success and avoid morale landmines. It’s far better to give an early prototyper who no longer has a role they can excel in a great severance package and a glowing recommendation than let them stick around and be a thorn in the side of your new slow and steady engineering team.

If your team trusts you because you listen to and trust them, promote and train your high achievers, and re-engage people over your core values and mission, you can change direction far more fluidly.

And I really mean this. The warning signs we’ve talked about are wrenches being thrown that can destroy the effectiveness and morale of an engineering team. But you can catch those wrenches in midair if you know what to look for. And if you focus on fixing the communication issues that arise as your team grows, then these wrenches don’t get thrown in the first place. The result is that your team can focus on doing what they do best: building great tech and figuring out how to scale it effectively.

I highly recommend the following books for grappling with many different angles of the topics covered in this post that go into far more depth than I ever could:

Thank you. I’d love your feedback.