Large Scale Payments Systems and Ruby on Rails
I’ve been writing code for fun and for work at both small startups and large companies for the past 30 years.
Over the years I’ve become fascinated with the payments industry, and I have spent the past decade working in payments related companies. Before coming to work on Payments at Airbnb, I worked for PayPal, for a few startups and then was part of the NFC Wallet project at Google.
The payments space is interesting because it combines a strong need for hard core computer science with an industry that is built on an aging foundation. The basic way payments work around the world has essentially stayed the same even as technologies such as location-awareness and strong encryption have become ubiquitous. The industry is ripe for change but extremely resistant to it. It is a fun technical problem and a tough business problem.
Over the years I’ve seen payments systems written in a variety of languages. I’ve noticed that the way payments work translates to requirements that affect the language and framework selection trade offs. For example, payments applications typically require strong transactional integrity, a robust audit trail and very predictable failure behavior.
Ruby on Rails is well known for its quick iteration cycle and the plethora of magical tools that speed up development and simplify prototyping. Those benefits are focused on improving the development process, but in some cases make maintenance of production system more difficult. In terms of the way payments work, some of its shortcomings mean trouble.
The ActiveRecord pattern favors large, monolithic model classes that contain database access logic and business logic. Testing these classes is difficult. It is not easy to describe their dependencies above and beyond the database table they depend on, and it it is therefore hard to write good, comprehensive unit tests for them. This makes it difficult to reason about their behavior. E.g.: Can you really guarantee that this 3000 line model that touches our transaction table does not contain off-by-one-cent errors if it doesn’t have really good unit tests?
Maintaining a log of ‘who did what when’ for every change involving money or user information is an important part of any payments system. ActiveRecord makes it trivially easy to make database changes, making it hard to ensure that every part of the code that mutates data actually records an audit trail. Furthermore, in some cases (e.g. with mass updates such as update_all), there is no way to enforce that every database change creates an audit trail. The only way to programmatically capture all data mutation is via database triggers.
Predictable Failure Behavior
It is usually a good idea to explicitly check for expected conditions and parameter values at the start of public interface methods. For example, if a parameter can’t be nil, raising an exception if nil is passed. This translates into very loud failure scenarios, but ultimately ensures that problems bubble up and get fixed, and makes bugs easier to track down.
Ruby’s weak typing makes it more difficult to enforce these rules. In strongly typed languages, the compiler complains if you pass in an int where a string is expected.
Furthermore, Ruby has lots of cases where nil is used as a sentinel to signify ‘no value returned’. This is dangerous because nil is also the default value of an uninitialized variable. This makes it difficult to identify bugs where a variable name is mistyped or otherwise not properly initialized.
The Airbnb way
Taking into account those issues, payments might seem more suited for a strongly typed language where the database access model is more restrictive. Yet, at Airbnb we use Rails for our payments stack. Instead of ditching Rails for its drawbacks, we’ve come up with solutions for its main issues, making it work well for payments. This enables us to continue to enjoy the many benefits it provides.
Reducing ActiveRecord’s surface area
A key requirement for proper audit trails is reduced access of code to the raw database data. To achieve this we’re starting to use a layer on top of ActiveRecord we call ProtectedAccess which controls changes to the underlying table and ensures an audit trail is written. It also greatly reduces the ActiveRecord surface area by not exposing all of its data mutation methods by default, and thus forces developers to reason about what methods they call, making it more likely that unwanted data mutations are caught during code reviews.
ProtectedAccess is designed to replace an existing ActiveRecord::Base class. It exposes some of the same interfaces, but actually hides most of the ways to mutate an object. The idea is to disallow most mutations to a Model that can’t easily be audit-trailed, and for those that can, create a shim that transparently creates the audit trail.
For example, suppose you have a Payment model that has an amount field. By virtue of extending ActiveRecord::Base, it exposes a setter amount=. Any piece of code can call:
which would create an un-audited modification of the payment record.
In comparison, ProtectedAccess creates a shim between the surface area, which still contains amount= and save, and the ActiveRecord model. When save is called it instead creates and saves a new version of the record instead of overwriting the existing version. There is no way for anyone to call the actual model that has access to the Payments table because it does not exist, save for a private member of the class Payment < ProtectedAccess object.
Parameter Validation — Fail Fast (and loud!) and Explicit Dependencies
We use a declarative framework for parameter validation that enables service objects (as well as methods) to check that parameters are of the correct type and value range, ensure mandatory parameters are present and also ensure that no unexpected parameters are passed in. It is loosely inspired by Guava’s Preconditions class.
An example might be an implementation of the validate method of a service object:
This is not an attempt to make ruby strongly typed (though that might not be such a bad idea…), but rather a way to explicitly declare and enforce the dependencies of an object or a method. It is also not dealing with instantiation, as some dependency injection frameworks do. But it can make it clearer to someone reading a piece of code what its dependencies are. It also causes faster failures. Without validation a bug would cause an unexpected value to propagate through the code until (hopefully) something blew up or (more likely) a weird result was displayed to a user with no indication as to the source. With validation an exception is thrown, a process crashes, and developers get alerted to the problem. Arguably, it is better to tell the user something went wrong rather than to present something that might be completely off.
Freezing constants is another related method we use, above and beyond the warnings the ruby interpreter issues when modifying a constant.
Parameter validation is very generic, and can be used with any Ruby application. We’re currently evaluating when the right time might be to open source it.
Instead of placing business logic in models, we keep models lean and focused only on data retrieval and storage. All business logic goes in service objects, which are single use business logic objects that are initialized with a clearly defined set of parameters, and then perform an action on them. They tend to be smaller in size and very focused on executing a particular task. Because they have a well defined set of parameters, they are much easier to test, and can be tested primarily using mocking (as opposed to creating test database entries or having test network services).
As an example, suppose you have some code that makes a call to some external gateway for completing a transaction. It might look like
A service object might look like:
Calling this service object would be done like this:
The service object framework will do some pre flight logging and then call perform, following it with post flight logging.
Note that validation, logging and other administrativia are now segregated to their own methods, and the perform method can focus on pure business logic. It can make lots of assumptions on the parameters, as they have been strongly validated. The interesting part of the code is also more easily accessible for someone trying to figure out what it is doing.
Because the o.validate must declare every parameter that can be passed to the service object, it is really clear what it depends on. This is a form of code-documentation that does not go out of date.
Strict Code Style and Code Complexity Filters
Rubocop is great for helping teach newcomers the ‘lay of the land’, and also keep everyone in sync as to what is the expected style of code. Cane’s ability to measure ABC code size metric is very useful in forcing the creation of small, easy to read and understand methods. We’ve occasionally seen cases where the cane threshold we chose (15 ABC score) was too restrictive, but generally it has pushed us to remain clear and concise in our code.
Take for example the following condition:
This has the ABC score of 12.
Compare it with the equivalent:
which has an ABC score of 8.
The differences are simple:
- Edge case conditions are evaluated early, and processing is halted instead of having an else clause.
- The email parameter is extracted rather than dereferencing through params and messages.
Cane does not propose those changes, it simply alerts you that there is too much complexity. It is up to the team to establish rules of thumb on how to reduce complexity when that happens.
DRY things up — Gems and Modules
The practice of DRY-ing things up is used quite heavily in the way of gems or mixins.
Gems are typically used to extract commonly used functionality that needs to be shared among more than one internal app. Some of our gems are also open source. Aside from those open source examples, we also have internal gems for securely sending credit card information to our credit card processor, for keeping currency information consistent among our various applications and for argument validation.
In cases where functionality needs to be reused within the same application, we sometimes use module mixins — putting functionality in a module and then including it where it is needed.
Disable replica reads
A common scaling strategy in a MySQL environment is to use database replicas for reads, thus reducing the traffic on the master MySQL. This has the disadvantage that your code might sometimes get stale data. For payments, we mostly disable replica reads, trading off load on the main database for data integrity. E.g, it is not acceptable for a new transaction you make to not include the results of the previous one.
Over the past several years we have learned a lot about building and maintaining large scale payments systems. Some of what we have learned is already implemented in our code base and is part of our daily best practices. Other learnings are still fairly new, and the frameworks they rely on still under construction. As always, our system continues to evolve and we’re constantly looking for ways to make it better and more robust.
If you’re interested in taking a more active part in our journey towards a better, more scalable and robust system, I’d love to hear from you — firstname.lastname@example.org
Check out all of our open source projects over at airbnb.io and follow us on Twitter: @AirbnbEng + @AirbnbData
Originally published at nerds.airbnb.com on February 26, 2015.