Secret Ingredients to Building Airbnb’s International Payments Platform

By Ian Logan

Airbnb has become a trusted global marketplace for finding the most interesting places to stay in the world. Part of what has helped build trust in the community has been our safe and secure payments platform. Payments through Airbnb takes the awkwardness out of the guest and host interaction, while still adding security for guests, as hosts are not paid until 24 hours after guests arrive. Security, reliability, and convenience are expectations from our community across all payment and payout methods. Each month Airbnb collects payment in 32 currencies and remits payment in 65 currencies in 192 countries. Building and improving the global pipes that collect and distribute money are daily challenges for payments engineering at Airbnb.

Before joining Airbnb, I primarily worked in investment banking building trading algorithms and applications in an environment with over 3 trillion dollars in assets. Bugs or optimizations resulted in literally hundreds of millions of dollars lost or gained. I once wrote a fractional penny optimization algorithm akin to the one from Office Space. This is not the approach at Airbnb. Our mindset is how we can build payments into a platform that can really help our guests and hosts. We’re not building payments for money managers, we’re building a payments experience that is simple, easy to use, and creates trust for our community. Often times we hunker down and knock out bugs that we know are impacting real user experiences rather than focusing strictly on downstream, backend optimizations. We’ve also been known to set Airbnb free by refactoring our payments platform in about 5 hours to help victims from Sandy.

Our community-minded approach to engineering goes further than the product. Unlike most banks which use proprietary and expensive software, at Airbnb, we use (and contribute to) open-source technology. Although there are differences from a community-minded and technology perspective, there are many similarities from a design pattern and practices perspective. I believe the engineering secrets to success in payments are more about design patterns and practices, and less about specific technologies (although I will later mention an example of improved performance achieved through the use of a different technology).

When asked about our payments platform, many people are surprised to see how fast we build and iterate with a small team. Airbnb’s payments team is responsible for everything related to a dollar flowing through the site: reservation booking, collection & distribution of money, pricing, iOS & Android payment experiences, financial reporting, internal payment APIs, and more. Our growth and scale has been exciting and has created unexpected challenges. We’ve learned a lot of practical engineering patterns and practices along the way which I would like to share. As with all things engineering related, we are learning and evolving each day, but we hope the tips shared here help you continue to build innovative systems.

So, what are these supposed secret ingredients you ask? Well good friend, you have a first-class seat into my delicious explosion of ideas organized into a few themes.

Never Stop Simplifying

An ongoing goal (for all software systems) should be to reduce complexity. Complex systems lead to slower turnaround and a higher likelihood of bugs. Eliminating unused features and proactively refactoring code should be the norm, not the the exception.

De-duplication is easy to forget when creating utility and helper modules. In other words, you should constantly sweep the code base for duplicate logic and reduce that logic into a common place.

Database schema denormalization is sometimes good to help avoid nested snowflake like structures, especially in an online booking system when compared to an offline accounting system (see the decoupling section below for more on booking vs accounting systems). For example, most use cases don’t require a fully normalized currency relationship:

payment = Payment.last
payment.currency.code
payment.currency.rate
payment = Payment.last
payment.currency_code # stored at the transaction level
payment.currency_rate
# rate can be derived if you store cent values relative to a base currency like USD
payment.amount_native / payment.amount

Loose Coupling in Every Dimension

Loose coupling and modularity are known design patterns in software engineering. One thing to remember is that the pattern should be applied to all levels or dimensions of a system: functions within a class or module, class or modules themselves, entire directory structures, interactions between services, boundaries across the stack, etc.

To be more specific, a payments platform should almost always separate the booking system from the accounting system. The pattern held true both at the investment banks that I previously worked at and at Airbnb. Online systems (where transactions are processed) often have very different data and application requirements from offline systems (where internal reports are generated).

From a class or module perspective, encapsulating rounding, foreign exchange (FX) conversion, and more can help identify new opportunities. Since Airbnb is so international, we've abstracted our two-sided transactions which can have any combination of two currencies into a complete, directed graph with self loops (i.e. a Canadian guest transacting in CAD can travel to an Australian host transacting in AUD, or a guest can travel to a host in the same country, or each vice versa).

currencies

Protect Yourself at All Times

When coding in the present, it is always a useful exercise to think about an engineer (not necessarily yourself) in the future. Making sure that code is easily comprehendible and extensible is a good proactive thing to keep in mind when structuring and designing code.

When building a payments system, self-defense is something to keep at top of mind. Defensive programming is a great strategy to help maintain predictability. Here is a simple example:

def self.compute_average_price(total_amount, start_date, end_date)
return {:price => 0, :type => :nightly} if !start_date || !end_date
nights = (end_date - start_date).to_i
# ... compute avg_price and type
{:price => avg_price, :type => type}
end
def self.compute_average_price(total_amount, start_date, end_date)
return {:price => 0, :type => :nightly} if !start_date || !end_date
nights = (end_date - start_date).to_i
# sanity check
raise ArgumentError.new(‘Invalid dates’) if start_date >= end_date || nights > STAY_MAX_NIGHTS
# ... compute avg_price and type
{:price => avg_price, :type => type}
end

This might come off as obvious but validate EVERYTHING. Enumerations, values, and relationships between values should be validated before the time of persistence, computation, and retrieval.

In many programming languages, there may be features that you should avoid using. Adhering to a style guide and minimizing unexpected magic helps prevent bugs. For example, default scopes in Ruby on Rails is a feature we don't like using with payment models so we instead define custom class methods.

At the database layer, it's great to have read replicas alongside a master, but in the payments world you need to watch out for replica lag since it is likely to cause data inconsistencies. Maintaining database consistency is crucial in avoiding duplicate processing of the same action (i.e. paying a host out more than once for the same transaction).

Economies of Scale

Batch processing is at times preferable to stream processing especially when we talk about payments. PayPal provides a mass payments API which is great for sending money to multiple recipients at a better transaction fee rate. If you need to bulk insert records into a database with ActiveRecord, I highly recommend the activerecord-import gem.

Following the secret ingredients listed in this post truly enables Airbnb's engineering team to move at the speed it does today. As alluded to above, when Hurricane Sandy struck, we partnered with the City of New York to quickly create a platform for New Yorkers to provide free housing to those who were in need. Check out the story on CNNMoney. Zero dollar transactions is not a common feature of most payment systems but I was able to implement the ability in about 5 hours without disrupting regular payment flows. Remembering to Protect Yourself at All Times, I implemented heavy validation throughout all relevant models to control places where zero should and should not be. We've since generalized this code so we can respond to a disaster anywhere at any time.

Big Bangs are for the Universe, not Code Releases

One trick to staying agile is to always release in phases by staging rollouts. At Airbnb, we deploy to production more than 10 times a day which means we don't have big release cycles. Automated tests are great to avoid drawn out manual QA cycles.

When introducing new services or porting logic, parity testing is a powerful strategy to help compare values online and prove correctness. For example, when we ported high throughput pricing logic from Ruby to Java (explained more in the performance section below) we ran both services sequentially while in-flight comparing results:

old_response = ruby_client.get_pricing(request)
begin
new_response = java_client.get_pricing(request)
compare_get_pricing_responses(old_response, new_response)
rescue StandardError => e
log_pricing_exception(e)
end
return old_response

Can you spot a potential problem with the code example just above? A big bang has already been introduced since each pricing request incurs a call to both services. Instead, the new pricing request can be incrementally ramped up by percentage (checkout Airbnb’s open source feature launching tool Trebuchet) or the individual service invocations can be made in parallel (although it's a bit tricky in a language like Ruby).

/* Begin Special Case */

In the process of building an international payments platform we've learned to store data in special ways. Date times should be stored in UTC epoch. Amounts should be stored in cents (integer column types are nicer to deal with instead of floating point).

In addition to precision at the database layer, one must never forget at the application layer to convert to specific time zones and convert to specific numeric values (i.e. ceil, floor, round, to_i, to_f). Precision and granularity matter and they can bite you in subtle ways:

reservation = Reservation.last
# the current system time can be in an arbitrary time zone!
reservation.start_time > Time.now
reservation = Reservation.last
# explicitly convert to a specific timezone
reservation.start_time > Time.now.utc

Another special case is race conditions between systems (internal, external, or a combination of both). For example, when our system asynchronously interacts with PayPal's system, we process data only after a certain time buffer has passed to make sure the database is in a consistent state. Even with synchronous interactions we highly suggest keeping track of activity before and after main points of contact (i.e. keeping track of the number of attempts).

Background processing (scheduled tasks) can also bite you in subtle ways. Remember to guarantee mutual exclusion in places where it is needed. For example, if you have a frequent payout task, then appropriate locks should be put in place in the event that the task runs over and a subsequent run tries to execute in parallel. In addition to mutual exclusion, background processing may also require a specific ordering. This can only be achieved by task chaining. Airbnb's open-source replacement for cron, called Chronos, supports arbitrarily long dependency chains.

Oftentimes payments logic requires many distinct cases when handling a dollar. For example, Airbnb provides a rebooking feature in the event that a host cancels on you, in addition to providing bonus credits at the time of rebooking. It turns out there are 6 unique cases which force our code implementation to get quite complex. The tip here is to over-invest in documentation in places where complexity is unavoidable.

Cautiously Calculated Performance Optimization

The peril of premature optimization is not a secret. Part of keeping things simple is knowing when to save optimization for later. At Airbnb, we allow hosts to set pricing at a daily, weekly, and monthly level for specific dates and in general with proration or without. When guests view our site they can browse pricing for any future start date. Price computation was previously written in Ruby as part of our main stack but we soon noticed that performance was an issue when both stack trace and object allocation analysis identified specific bottlenecks. Tight loops were hard to get around in Ruby so we decided to port the logic to Java where we can invoke things in parallel. The end result yielded a 4.5X performance boost and the pricing service now returns results in a matter of milliseconds.

It's often better to lazily optimize so that you can focus on what matters today. For example, just recently we hit performance bottlenecks with a high volume email that is sent out to hosts when they get paid. Airbnb already had a high throughput email service which was built previously for unrelated use cases. We decided to port the email to the new service only after observing that it was an issue.

Test Under All Weather Conditions and Environments

Payment gateway interactions should not only be tested for correctness. Response failures should also be stubbed and mocked to make sure your code handles all scenarios. Sometimes making assumptions is a bad thing especially in a production environment where things can happen that otherwise will never in a non-production environment unless explicitly simulated.

In addition to writing automated tests, running tests in a production environment obviously requires an A/B testing strategy. One idea that's easy to forget is that kill switches (i.e. an on/off switch) should be used when introducing new functionality in the event that unexpected things happen.

Maintain Control of Controls

Bookkeeping and row level tracking are characteristics of a solid payments platform. It's good to remember that various dates can be associated with a single object (i.e. created at, updated at, charged at, booked at, state changed at, etc). Also, immutability is great for data management but in the event that you don't have pure immutability tracking, storing all changes to a record definitely helps. Time series data is ideal for accounting systems (also known as data warehouses).

From an application perspective, explicit use of whitelists and blacklists helps control data flows and helps future engineers easily extend code. It’s good to remember to take the effort to define reusable constants instead of hardcoding in various places. For example:

# whitelisting
ALLOWED_ERROR_CODES = [ '1234 Processor Declined', '5678 Processor No Longer Exists' ]
# blacklisting
COUNTRIES_TO_EXCLUDE = ['US', 'CA']

Embrace the Adventure

Being truly international leads you to unexpected and challenging places when integrating with local payment providers. Creating abstractions and APIs definitely helps hide implementation details. But sometimes APIs simply aren't available, as highlighted by our integration with Western Union for host payouts to South America and other parts of the world. Airbnb loves to use Apple laptops, but there is exactly one Windows laptop that we own which was required by the Western Union integration where we process and generate fixed width format files. Ah, the wonders of byte order mark (BOM).

Do you get excited about payments and have more ideas to improve the experience for our guests and hosts? Comment here or send me your thoughts ian (at) airbnb (dot) com.


Check out all of our open source projects over at airbnb.io and follow us on Twitter: @AirbnbEng + @AirbnbData


Originally published at nerds.airbnb.com on August 14, 2013.