Connect(ing) Chick-fil-A

Published in

chick-fil-atech

9 min readSep 6, 2023

Background

In 2018, Chick-fil-A’s Customer Technology team was having a bit of an API Problem. The team was adept at building APIs that did remarkable things, such as sending a customer’s order from their phone through a labyrinth of networking down to our legacy Point-of-Sale system, but interacting with those APIs required heavy collaboration between teams. To put it more precisely, what we had was an API Contract problem. Teams rarely had explicit, well-documented contracts. Most contracts were either poorly documented or completely implicit. This was to be expected in the very startup-y environment that we worked in for the first few years of our e-commerce program. The original, scrappy team launched a mobile app and e-commerce platform back in 2016, and due to accelerated growth from marketing efforts and the COVID pandemic, we saw it grow from a very small to significant portion of our sales.

As we matured, the lack of explicit, well-documented contracts became a problem. The front-end developers often needed to reach out directly to back-end developers to figure out if a value should be passed up as a string or an integer, or if a field was required or not. The back-end developers either relied on memory or had to dig into the source code to understand what the back-end was doing, what’s required, what type something is, etc. This was an obvious problem. As the number of teams scaled with our growing scope, it only became worse. Our goal was to optimize towards teams operating in “X-as-a-service” mode which required only limited cooperation from teams to integrate. Instead, we were finding all teams were operating in “Collaboration” interaction mode (re: Team Topologies).

Fixing it

So you might be thinking to yourself, “Why don’t you just write better documentation of your contracts?” Fair point. We certainly led with that. We started with teams writing simple markdown files that would describe their APIs in depth in a fashion ripped straight from Basecamp’s API docs (thanks Jason Fried). This mostly worked for us. It especially worked for teams that had technical analysts who could help offset some of the burden on the developer who had to clean up historical documentation. However, the process wasn’t evenly applied across teams and relied heavily on human memory to make sure documentation was kept up to date. As an example, our Location API team had a strong technical analyst keeping the documentation, but many teams didn’t have that same skill set. This experiment put several of our key APIs in a much better place, but it added a decent amount of overhead to teams and still allowed for an unacceptable amount of human error.

After this experiment was underway for several months, we were hungry for a better fit for our organization. We started looking at other options; broadly viewing them through the lens of the following heuristics:

Proximity to code: This was a must. The further from the code (the source of truth) the documentation lived, the further it would be on the curve of fragility. We found that fragile documentation can lead to worse outcomes than having to rely on collaboration alone.

Functional ownership: If the “experience” that product owners of technical back-end systems own is the API (emphasis on the “I” for Interface), then it follows that their customers are front-end teams and other back-end teams, instead of your traditional Chick-fil-A customer. So to best serve their customers, these API teams’ SLOs should be on measurable things such as P95 latency and availability to harder-to-measure objectives like developer experience and ease of integration. It’s harder to influence these “squishier” things, however, if the Product Owner has less visibility into them, they have no real ownership. Bringing them out of the code and into a view that Product Owners can visualize allows them to have better ownership of the entirety of their product.
Contract-first, canonical interface definitions: This was the biggest deviation from what we had tried to date. We needed to lead with the contract, not simply derive it from the implemented classes on the back-end. When dealing with implicit contracts, it’s very easy for copypasta to take place and for the front-end team to think, for instance, that the customer identifier field is named customerId when in actuality the back-end team named it userId. Oops. Not saying anything as silly as this ever happened to us, but a birdie told us it could. If we could drive towards canonical definitions of interfaces, that would be a big step. Generated SDKs would be preferred as they don’t allow errors to be introduced in the first place.

With these heuristics in mind, we started evaluating several technologies that have helped hundreds of organizations solve this same problem.

GraphQL

There was much to like here. It certainly is a compelling technology with many bells and whistles, especially if your APIs are already using Node.js. Because our APIs are mostly built in Java and Go, the tooling support wasn’t as polished for those languages, and the ergonomics of server-to-server calls felt uncomfortable. Also, the touted benefit of flexibility is less valuable when you have control over both the front-end and back-end implementations.

OpenAPI Specification

OpenAPI almost won us over. We were using it in other parts of our DTT (Digital Transformation & Technology) department. We had already been using Swagger via Springfox; however, we didn’t fully utilize the Swagger annotations and it left our documentation feeling lackluster. The sprinkling of annotations can also leave the code feeling a bit muddied and put all the onus on the developer to update. This could be solved with a bit of process and overall desire from the team, but it didn’t give enough control to the functional owner of the API.

gRPC

gRPC is a remote procedure call framework, designed by Google, and it has some pretty compelling benefits. You start with a protocol buffer file that contains some basic definitions of interfaces and services. You define your schema within a Service and your interfaces using Messages. Overall, it’s a relatively simple syntax to pick up and reads far easier than yaml.

There was a lot to like here:

✅ Canonical interface definitions through generated code.
✅ Proximity to code and the ability to commit contracts right alongside the implementing APIs.
✅ Functional ownership with an easy-to-understand syntax allowing functional folks to both understand and contribute to the contract. Seemed like a win on all fronts.

That was until we started using it beyond a Proof-of-concept. Then the dialogue went something like this:

‘protoc’, was this CLI built by some amateur shop? Oh, wait, no, it was built by Google. Why is it so clunky? No idea. Fine, we can figure it out. Wait, how on earth do I get ‘protoc’ to generate Java code? Okay, got it, through Maven plugins, that seems different than how every other language works with ‘protoc’. Okay, now we can generate some code, let’s put this in one of our Spring APIs. Oh, the support here seems sort of lackluster. Let’s pivot to trying this in a Go API then.

Finally, we get a service stood up in Go, running in our environment.

Okay, let’s route some external traffic to this thing. Hmmm, ALB support for gRPC in AWS is brand new and not super well documented. Let’s see what we can do to get this stood up. Oh shoot, we’re finally getting traffic to this thing from a front-end client! Phew.

To say the least, it wasn’t straightforward getting all the networking hops stood up. On top of that, gRPC has been built inside a walled garden. You can’t use the same middleware you did before, you can’t use curl, you can’t use your normal debugging proxy, and you can’t use your same HTTP library. You are LOCKED in. All the touted benefits are traded in for a worse developer experience. Not exactly a desirable tradeoff. But we still pressed on. Running this in production, we ultimately concluded that while we liked many of the benefits that Protobuf brought, we couldn’t live with the poor developer experience that protoc and gRPC brought along with it for both the front-end and back-end teams.

Connect

Finally, we came across a newer technology offering, Connect. What a day that was. At the time, Buf had a slick CLI tool that promised to be faster than ‘protoc’ and more importantly, have better ergonomics than their competitor. They also had a crazy vision to make the whole world of Protobuf better, cleaner, easier to approach, and anything but a walled garden. They certainly delivered on that promise.

Buf ultimately delivered on the grander vision of bringing the world Connect. Connect is a protocol that allows for three modes of interaction:

gRPC interoperability: Backwards compatibility with both gRPC clients and servers (if used as a client).
HTTP POST + Protobuf: Gives you the serialization benefits of Protobuf with the well-understood nature of a POST (or now, optionally, GET) request.
HTTP POST + JSON: Gives you the visibility of JSON, while still having an enforced, breaking change detected, contract. This is a perfect option for lower environments where traceability is far more important than latency.

Using Connect alongside Buf has allowed us to adopt a process that has worked extremely well for us and has moved us away from “Collaboration” mode and far closer to “X-as-a-service” interaction mode (re: Team Topologies).

The process looks like this:

Teams define the API models and contracts in Protobuf. As the new API is being worked on, the Proto files can stay in a branch. That branch automatically gets synced to the Buf Schema Registry as a “draft” every time it gets pushed to Github. This allows front-end clients or back-end consumers to generate the code in “beta” mode while the contract is still being developed. It can also allow teams to parallelize their efforts if needed.
A Pull request is opened for the owning team and consuming team(s) to review. This runs breaking change detection and verifies some linting rules. Breaking change detection has to be our favorite benefit by far. Buf ensures backward and forward compatibility for your APIs when using their breaking change detection. Back-end teams can ship with confidence knowing their contract changes aren’t going to break their consumers.
Once teams feel good about a contract, it gets merged into the main branch of the implementing API team. It then syncs with Buf Schema Registry. This allows the consuming teams to view API documentation and instructions on using the generated code.
Repeat as needed.

Buf + Connect gave us all the many promises and benefits that Protobuf and gRPC promised, with none of the downsides. Many of our teams now build with “Contract-first API design” and it has drastically improved how we build APIs and how teams interact.

In Summary

As the Chick-fil-A Customer Technology team grew, we experienced exponential growth in the number of communication channels because teams were reliant on close “collaboration” (re: Team Topologies) as their mode of interaction. It became clear we needed to work towards teams being able to self-service their needs where possible, operating as “X-as-a-service”. This led us to the discovery of Protobuf and the tooling that Buf provides in their Buf Schema Registry (BSR) and Connect. These tools helped move our teams towards providing a self-service pattern of interaction and helped shape how we build, document, and integrate our APIs in the complex world that is digital commerce at Chick-fil-A.

When you’re operating as a small shop, you can get by with inefficiencies in team interactions because your communication matrix is small and simple. As your organization scales, however, you should strongly consider investment in tools like Protobuf, Buf, and Connect that help provide a medium to streamline your team’s interaction modes.

About The Author

My name is Bradley. I’m a Sr. Lead Engineer over the back-end systems that power our loyalty program and personalize our communication channels. I have always had a passion for optimization and learned to channel that passion to optimize our tech, process, and communication patterns within the Customer Technology Engineering organization that I sit in. In discovering how we best document our APIs at Chick-fil-A, I was able to lean into the trifecta of all 3: people, process, and technology. If this type of problem-solving excites you, we are hiring!