Unlocking Horizontal Scalability in Our Web Serving Tier
by Liang Guo
Airbnb’s web application is powered by Ruby on Rails and Java. The web-facing application is a monolithic Rails application, which runs on many web servers. Web requests are handled by the Rails application and it talks to several different Java services, for instance the search service or listing pricing service. In Airbnb’s technology stack, MySQL databases play the critical role of storing core business data. We partition databases by application for ease of capacity planning. For example, users’ messaging threads and listings calendar management are separate from the core booking flow and they should be managed in their own databases. In 2015, we did a couple of database operations in the spirit of partitioning core databases by functionality (see this engineering blog post for how we did it). As the site traffic grew at an amazing rate every year, the infrastructure team responded by horizontally scaling the application servers tier for compute capacity and vertically partitioning databases for database headroom. Through the summer peak season of 2015, this 2-tier architecture had been working pretty nicely. However, one notable resource issue with MySQL databases had been the increasing number of database connections from application servers.
We use AWS’s Relational Database Service (RDS) to run our MySQL instances. RDS uses the community edition of MySQL server, which employs a one-thread-per-connection model of connection management. The threads model in MySQL server is likely to hit the famous C10K problem. The C10K problem is that there is an upper bound in the number of connections that MySQL server can accept and serve without dramatically increasing the number of threads running, which severely degrades MySQL server performance. In MySQL, the Threads_running counter measures the number of queries that are executing concurrently. However, due to limited InnoDB storage engine thread concurrency, a spike in this metric really means client queries are piling up in MySQL database. When it happens, MySQL query latency increases, requests queue up across the entire stack and error rate spikes.
In production, we have had several severe database incidents that manifested in a large spike in active running MySQL server threads. The graphs above illustrate the Threads_running spike and the simultaneous applications error rate spike. While the root cause could be attributed to a poorly written query or underlying storage system outage, it usually would take a long time for MySQL database servers to recover. Often time, engineers had to resort to manually killing connections as a stabilization trick.
As bad as the threads running spike was, the most pressing problem was not even fire fighting database incidents when they brought site downtime. It was the database connections limitation. Application servers that run the web-facing application had direct connections to core RDS databases. MySQL server allocates thread stack and other resources for each client connection. Although thread stack size could be tuned to allow handling more client connections, it is a limited resource. A large number of threads would cause scheduling and context switch issue too. When a RDS MySQL server hits the resource limitation, clients will have a problem in creating connections. The database connection limitation puts a cap on the application server capacity that we can have to handle growing traffic. At the end of summer 2015, with the foresight of hitting a scaling bottleneck for 2016 summer traffic, the engineers on the infrastructure scalability team started to look into viable solutions.
Airbnb MaxScale Database Proxy
To be fair, MySQL has a dynamic thread pool feature, however, it is only available in MySQL enterprise edition. Percona server for MySQL and MariaDB have similar offering too. Since Airbnb uses AWS MySQL RDS, we do not have access to MySQL thread pool feature. We had to look at an external proxy to address the connections limitation. We investigated several different open source technologies, and we chose MariaDB MaxScale. MariaDB MaxScale is a MySQL database proxy that supports intelligent query routing in between client applications and a set of backend MySQL servers. However, MaxScale did not solve the connection limitation problem because it would require establishing one backend MySQL server connection for each client connection. Since what we were looking for was connection pooling, we decided to fork MariaDB MaxScale and implement that ourselves.
In Airbnb MaxScale, connection pooling was implemented by multiplexing N client connections over M connections to a backend MySQL server. After a client connection request completes successful authentication with a backend MySQL server, the proxy severs the link between the backend connection and client connection and parks it in a connection pool of the backend server. The server connection pool size is configurable, and is typically a small number. Since we forked MariaDB MaxScale 1.3 developer branch, we were able to leverage the persistent connections feature to implement the server connection pool. When receiving a query on a client connection, MaxScale picks a backend connection in the pool, links it with the client connection, and forwards the query to the backend MySQL server. MaxScale understands the transaction context of a client session and therefore it knows to keep the linked backend connection until transaction commits. The link must be kept and used for forwarding the query result back to the client.
One challenge in this connection pooling implementation is knowing when to unlink and return the backend connection to the pool. A query response consists of one or more MySQL packets. Because MaxScale keeps one-to-one link between a client connection and a backend connection, it just forwards response packets as they come. In connection pooling mode, unlinking a backend connection prematurely would cause the client to wait indefinitely for the complete set of MySQL packets. For correct MySQL query response forwarding, we implemented MySQL packets by following the MySQL client server protocol for COM_QUERY_RESPONSE. That way, Airbnb MaxScale does not unlink a backend connection until it has seen complete MySQL packets of a query response. Aside from forwarding response, it allows us to measure query response size for monitoring.
The typical server connection pool size is configured to 10 in production. With many instances of Airbnb MaxScale proxy servers, the number of database connections on a MySQL server is several hundreds. In a normal situation, only a small portion of connections are active in use. When an underlying storage outage happens or a bad expensive query hits the fan, query execution becomes slow and it will be noticeable that the server connection pool runs out on each MaxScale proxy server instance. We take the symptom as a signal that the backend MySQL server may run into the concurrent threads running spike problem, and proactively throttle client requests by killing client connections. In production, request throttling has been proven very useful in preventing database incidents due to transient storage system outage.
MaxScale uses an embedded MySQL parser for query routing. It actually builds a parse tree for every MySQL query in its query classifier module. We leveraged the query parse tree for bad query blocklisting in Airbnb MaxScale. The motivation for this feature was to protect us from Ruby VM heap memory corruption. Memory corruption can cause MySQL query statements generated by Rails ActiveRecord to become corrupted in such a way that its conditional predicate can be completely removed. This blog post has a detailed explanation of the nasty Ruby heap corruption problem.
The MySQL parse tree makes it easy to inspect the predicate list of a MySQL query. The query blocklisting feature leverages the MySQL parse tree to look for existence of malformed predicate conditions in update and delete statements and reject such statements. For more protective coverage, we block MySQL update and delete statements without any predicate condition as well. We deployed Airbnb MaxScale with query blocklist feature and it has protected us from at least one instance of scary corrupted query that could have caused damage to one of our core database tables.
Database Proxy as a Service
MariaDB MaxScale supports a multiple worker threads model. For practical reasons, we chose to use a single worker thread in an Airbnb MaxScale proxy server, and we deploy many instances to achieve concurrency. At Airbnb, we use SmartStack for service discovery. We deploy a cluster of Airbnb MaxScale servers as a database proxy service for each core production MySQL database. Applications discover and connect to the database proxy service instead of the MySQL database. With the Airbnb MaxScale database proxy service in between application servers and MySQL servers, the web application tier can be scaled horizontally per capacity demand. As a new tier in the core architecture, a database proxy service can be scaled horizontally as well, by launching new Airbnb MaxScale proxy server instances.
The diagram illustrates the 3-tier architecture in which different database proxy services are deployed in front of different core MySQL databases.
The Airbnb MaxScale database proxy introduces an additional network hop. The computation in the MaxScale proxy is lightweight in that it does request routing and response forwarding only. The connection pooling feature that we added in Airbnb MaxScale is straightforward and does not add overhead. The concern of possible latency impact by the extra network hop led us to implement availability zone aware request routing in SmartStack. SmartStack used to route requests to backend servers in random availability zones (AZ). The AZ aware routing allows application servers to send requests to database proxy servers that are in the same availability zone. We did extensive stress testing using our database workload replay framework and we found that the latency concern was really minimal, if not negligible. By the time we were ready to launch, we had high confidence in Airbnb MaxScale.
Earlier in 2016, we deployed Airbnb MaxScale into production. In a smooth operation, we switched the monolithic Rails application connections from direct connections to MySQL servers to the database proxy service tier. MaxScale is MySQL protocol compliant. Switching connections to Airbnb MaxScale didn’t require any client application changes. The following graph shows a drastic drop in the number of database connections on one core database when we performed the operation.
Correspondingly, the connections from the application shifted to the new Airbnb MaxScale database proxy service instead. At the time when we deployed the new database proxy, some of our MySQL databases were at the verge of database connection limitation. So, it made it into production right at the critical time.
With the connection pooling database proxy, we were able to scale the application server tier with the addition of more servers without an increase in MySQL server threads. We went into the summer of 2016 with peace in mind that we would have the service capacity required for handling yet another record traffic at the peak of the summer.
Today, we have more than 15 Airbnb MaxScale database proxy services in production, each for a different core MySQL database and with different provisioned capacity. Many hundreds MaxScale server instances are churning requests in between web application servers and MySQL databases. The automatic request throttling has effectively worked as a back pressure mechanism, and it essentially replaces engineers killing connections manually. It has been running reliably and has not had one instance of crash in production.
At Airbnb engineering we believe in open source strongly. So far, we have been having great production experience with Airbnb MaxScale. We’d like to share it with the community and therefore we announce that we have open sourced Airbnb MaxScale. The complete source and documentation can be found on github. Please try it out, and feel free to share your comments and pull requests with us. If you find working on this kind of problems interesting, we are hiring talented infra engineers, and please come work with us.