Maintaining Quality at Scale
When I joined Airbnb in the fall of 2012 things were a little chaotic to say the least. At some point in the preceding year the growth of the company had kicked into high gear, or hyper-growth, as some call it. Along with the increase in site traffic and transaction volume, the engineering team also expanded very rapidly. When I arrived we were a team of about 40, compared to 16 one year before (and over 130 today). Looking back now, I think we were in the midst of a make-or-break period in Airbnb’s history.
Growing a website and engineering team as quickly as Airbnb presents both technical and cultural challenges, and in many cases those challenges are intertwined. In the old days, we were struggling with many issues, but broadly those issues were related to scaling a monolithic Ruby on Rails application (which wasn’t built with scaling in mind) and scaling the engineering team in a way that facilitated the former.
Today, I think it’s fair to say that we’ve overcome the existential challenges we faced back then. This is not to say we don’t have numerous new challenges and hard work ahead of us, but we survived the kick into hyper-growth without tearing apart. There are a lot of different things we can look back on, but in this post I’d like to focus on how we have transitioned to writing code in a more maintainable (and thus scalable way).
I work on the Payments Team at Airbnb. For us, code quality, robustness, and maintainability are of the utmost importance. With the amount of transactions flowing through our system these days, we literally can’t afford even small mistakes. Over the past year, we’ve put a lot of work into figuring out how our team can continue to iterate quickly while still producing quality code and avoiding technical debt. The three things that I think have been most helpful at this level are forming and adhering to specific code style and conventions, practicing a comprehensive and collaborative peer review process, and testing.
Code conformity relates both to syntactic/textual style and to applying conventions and best practices when writing code. It has been said many times before, but if you have a codebase where you can open up a file and tell which person on your team wrote it, that’s a very bad sign. Formulating a standard way of writing code and getting everyone on board has been highly beneficial for us.
Computer code is generally consumed by two classes of entities, machines and humans. Machines generally don’t care about what your code looks like (assuming it compiles), but humans on your team probably do. Having a wide variety of styles and approaches to similar problems in your codebase will create unnecessary cognitive load for everyone, and eat up a lot of time as people attempt to grok new terrain. On the other hand, if everyone on the team is writing code in a more-or-less identical manner, everyone on the team should be able to grok, debug, and/or maintain one another’s code with close to the same level of effort (once they get through the initial ramp-up). Put another way, it’s important to favor the human reader, not the machine, when crafting your code, and it’s even more important to be consistent. This is not to say you should chose an O(N) solution over an O(1) solution for readability or simplicity’s sake, but there are many steps you can take to facilitate your fellow humans through code conformity.
Some style rules we’ve settled on are completely arbitrary. Take for example indentation characters (tabs vs. spaces), or which line the opening brace goes on. It’s hard to argue for one over the other in cases like these. The important thing is settling on something and then having everyone stick to it. On the other hand, some seemingly simple rules can have vast ramifications.
Consider line length. We like to keep lines short and sweet. Shorter lines are not only more friendly to other engineers’ eyes (and editors), but they also tend to result in cleaner, simpler statements (especially when paired with descriptive variable names). Methods that consist of a series of short, simple statements are easier for other team members to comprehend and modify. Diffs (in version control tools such as Git) also end up showing more clearly what behavior was changed between commits. And, since we’re also writing testable code (see “Testing” below) which promotes small, concise methods, you start to develop a very clean, modular, and easy to understand codebase.
Consider this example:
Imagine we wanted to tweak the behavior of this line by switching from downcase to upcase. The diff would end up looking something like this:
It’s hard to tell which part changed without parsing the line in your head. Had the code been written more like this:
The diff would also be much cleaner, showing exactly what behavior changed:
The funny thing is that those of us who started promoting this faced a good deal of pushback in the beginning. Like testing, however, we were persistent and we’ve gotten to a place where this has become second nature. To see some extreme examples of this mentality, I suggest checking out the Joint Strike Fighter C++ coding guide or the JPL C Coding guide. Obviously these standards are overkill for most consumer web applications, but always keep in mind what type of application and goals you have when settling on rules. It’s about picking the right balance.
As mentioned before, we’ve also begun to create style guides for higher-level concepts such as API and service design. Although prescribing how to build things like this can be a bit more of a gray area, it’s been incredibly useful to consolidate the collective knowledge of the team into single resources that are easy to digest. When coupled with a peer review process and a collective insistence on conforming, we’ve eliminated a lot of pain points and needlessly repeating conversations.
Another way we have been able to achieve more consistent code is by having a comprehensive and collaborative code review process. Code reviews come with many benefits. The obvious case that comes to mind is a colleague catching a horrible bug before it goes out and brings the site down. But there are many more subtle benefits. Over the past year or so, we’ve instituted code reviews for every change and we like to see at least one other person to sign off on each diff before it goes out.
Having just one other person sign off, however, is the bare minimum. We encourage everyone who has context to get involved, and most non-trivial pull requests have at least one or two substantive conversations pop up. Sometimes these can get pretty deep and some of us end up learning something we didn’t know about (credit card processing, database internals, and cryptography come to mind). If there are big insights we are also sure to record those in the relevant guide/documentation. Most importantly, there is no authority in code reviews based on seniority or title. Everyone’s input is valid and yet best practices always tend to prevail and new team members tend to learn quickly.
It’s important to remember that just because a style guide exists doesn’t mean that it will always be followed (or even read) or that it addresses every case. Peer reviews are what really help people get on the same page. They are also an effective way to learn and teach the art of programming itself, which can’t be as easily captured in pre-written material. In addition, having an environment where people learn a lot on the job helps keep them engaged and boosts the quality of their work.
As an illustration of something that might not belong in a style guide because it falls more into the wisdom category, take this example.
We used to run into a lot of examples like the first one in our codebase. While they both output the same thing, the first example is far more expensive, and at scale could actually cause a catastrophe. Adding select to the User.active proxy object means that Rails will go to MySQL, fetch every active user, instantiate them, put them into an array, then iterate through that array and select the users with country equals ‘US’. All this just to get the count.
In the second example, we start with the same proxy object, Users.active, but then we use where to filter on that object. The first line does not trigger any DB queries! On the next line, when we ask for the count instead, Rails knows to just do a SELECT COUNT(*) query, and not bother fetching any rows or instantiating any models.
Knowing the language and framework well is very important, along with spreading that knowledge to everyone involved. We have been diligent at Airbnb about fixing these types of common mistakes and promoting the best practice of pushing work to the DB when it makes sense. Take another example. How much did we pay out today?
The first example could be worse, we do at least narrow the scope to a single day, but it still causes a large number of payout objects to get fetched and instantiated, just to sum a numeric field. In the second example, we have MySQL do the summing during the lookup and just return the number we care about. This comes at essentially no extra cost to MySQL yet we avoid both unnecessary network traffic and a potentially vast amount of processing in Ruby.
So one benefit to having a healthy review process is to formulate and teach conventions and best practices that will benefit everyone. These seemingly trivial examples make our site more responsive (and can even prevent it from going down). Things like this also might have the side effect of striking up conversations about database internals or financial reporting. The thing about this kind of knowledge is that it’s shared and it would be hard to consolidate it into a single resource (and expect everyone to read that resource). It gets introduced by new team members, or veterans who have been bitten, and then passed down through the generations. It can only live on, however, if there is an active, healthy dialog surrounding the production of code. Personally, I feel as though the rate at which I’m learning in this environment is much greater than in previous positions I’ve held where code reviews and quality weren’t taken as seriously.
I’m not going to go too deep into testing in this post because my colleague Lou recently published an excellent post of his own on this blog. What I will say though, and I don’t think I’m saying anything new, is that I believe testing is extremely important to writing quality, maintainable code. However, it’s not just about raw coverage. It’s about creating culture in which testing becomes second nature to everyone on the team. When writing tests becomes second nature, writing testable code becomes second nature. At this point you really start to see dividends.
So concluding this little overview, I’ll summarize the three main topics again. Code conformity. On a professional team of any size, it’s important for everyone to write code the same way. This helps spread good practices and makes maintaining other peoples’ code much easier. Style guides are a great starting point. Even seemingly simple style choices can impact code quality, robustness, and ease of refactoring. Active, healthy review process. Above maybe all else, it’s important to foster an active and healthy review process. At least one person besides the author should look at each diff, and suggestions should be gladly given and welcomed by team members of any level. This helps spread wisdom. It helps newer team members learn and develop good habits. It helps more senior members teach and stay honest. Ultimately it can keep your application from failing spectacularly. Testing. It’s important to have a strong testing ethic, and not to compromise or take shortcuts. The more we test, the more second nature it becomes, and we might eventually even learn to love it. It doesn’t have to start by decree, but can grow organically in a few spots until it begins sprouting up everywhere.
Check out all of our open source projects over at airbnb.io and follow us on Twitter: @AirbnbEng + @AirbnbData
Originally published at nerds.airbnb.com on November 18, 2014.