Move Fast, Reliably

Tanooj Luthra
Brex Tech Blog
Published in
2 min readDec 18, 2020

Over the past year, Brex has significantly grown its engineering team, launched an entirely new product (Brex Cash), and is onboarding new customers faster than ever. As with any growing company, there’s a fine balance between developer speed and reliability. But being a fintech company, reliability (and system availability) isn’t really a lever that can be moved. So, how do we ensure we’re delivering on our product according to our customers’ needs and still maintaining our reliability? Enter SLOs.

According to the Google SRE Book, an SLO is “a service level objective: a target value or range of values for a service level that is measured by an SLI (service level indicator)”. More simplified, it’s setting a metric (latency, uptime, etc.) and putting bounds on it, (99% of request latency should be less than 100ms). Picking the best SLOs can be complex, but having SLOs allow us to explicitly set what is important to us, and be able to quantify our availability. We can use these to know whether or not we can move faster on new initiatives or should double down on solidifying our existing platform.

A few months ago we started our journey into SLOs and we wanted to share how we’re thinking about this internally. It’s a long process, one we’re still in the middle of, and one we will continue to iterate on. But, along the way we wanted to put a series of blog posts to share a few of the bigger steps and questions:

  • Getting buy-in from the company. How do we get the rest of the engineering team on board and commit to implementing them? How do we bring in product, design, and the rest of the company to realize the value?
  • SLOs implementations and adoption. How do we reduce as much overhead as we can for developers to start using them? What’s the best way to collect and surface this data? How do we get the right SLOs on every service?
  • Making decisions based on SLOs. Now that we’ve got a lot of the data in place, and we’re surfacing that to the whole company, how do we drive engineering decisions from these SLOs? How does our planning and prioritization process change based on meeting or missing these objectives?

In this series, we’ll answer these questions and dive much deeper into how Brex tackled the challenges that came with all the above as well as the lessons we learned along the way.

--

--

Tanooj Luthra
Brex Tech Blog

Eng @BrexHQ. Prev. Founder @Elph (YCW19) → @BrexHQ, Crypto Engineer @Coinbase, Founder @Streem (YC S12) →@Box