Within the Eagle Eye Networks cloud, we have a series of application microservices that we consider our middleware. Our middleware provides services for both internal systems and our public API. Because of the large volume of requests handled by the middleware, the Eagle Eye Networks Middleware Team designed the service for performance using Django+Gevent with an OpenLDAP backed persistence model.
Gevent’s Pywsgi Server has been proven to handle nearly 5000 requests/sec according to the Mixpanel Benchmarks. In our production environment, for the Austin datacenter, we run 30 instances of the middleware services with the ability to scale horizontally. Our top goal was maximum concurrency.
We didn’t understand how bad our connection management was
In early 2016 Eagle Eye Networks achieved a critical growth milestone. With the additional load on our system, we did not observe the performance we expected. Our metrics were disastrous. We were frequently saddled with sluggish performance and request timeouts. We suspected slow queries as the cause and began analysis, looking for bad indexing or long running queries. Our LDAP logs indicated blazing fast queries but also revealed an unusually high number of open connection from our middleware. Our LDAP server was periodically reaching the open file descriptor limit.
Increasing the server’s connection limit only bought us a little time. We knew, going forward, this would not be a viable scaling solution. We quickly exhausted the number of connections again. Another outage! After extensive research we determined there were two issues at play. The first was a connection leak (look for an upcoming post). The bigger problem was Django+Gevent’s connection management.
Unlimited concurrency’s hidden cost
Why were we seeing massive amounts of LDAP connections from the middleware? We began to dive into how connections are managed by the middleware. Gevent’s WSGIServer hands off each incoming request to a new Django.wsgiHandler Greenlet. The Django framework requests a new database connection on the first query and reuses that connection for the life of the request. When the request completes, the connection is cleaned up and life goes on. This was really bad news. This meant, for each concurrent request, there was one open connection to LDAP.
How could we run without limiting concurrency on the WSGIServer? Better yet, how do we avoid limiting our concurrency by the number of connections supported by our LDAP Server?
Also, the latency for each request was killing us. The latency for opening an LDAP connection got exponentially worse as concurrency rose. It appears that within the system LDAP Client library, opening a new connection was synchronous and sequential.
We needed better connection management
The obvious solution was to implement some sort of connection pooling and simply hand off open connections to new requests as they arrive. This would reduce the latency, but would still limit our concurrency. Considering connections are held for the life of the request, we’d only be able to serve as many requests as there were connections in the pool. Effectively we would be limiting concurrency by the number of connections in the pool. We needed to maximize the usage of our connections across all the active requests.
Request time breakdown
To understand where time was going for a request, we looked at a simple endpoint. The GET /user makes two LDAP queries to fulfill the request. Historically, the GET /user request would complete in 150ms. According to the LDAP logs, the corresponding queries take between 3–8ms each to complete, for a total of 12ms. For the entire time that the endpoint is holding onto that connection, it’s only used 8% of the time. We were wasting 92% of a precious resource. We needed a connection pool that multiplexed the connection across all the concurrent requests to maximize our connections.
Multiplexed LDAP Connection Pool
Excited by the prospect increasing our capacity by 92%, we forked the Django LDAP library and added pooling per query. The pooling library was simple. We gave it options to limit the size of the pool, a timeout so that we could scale down the number of idle connections, and a minimum number of connections to always keep open. And of course, as standard practice, we outfitted it with metrics so that we could monitor the free pool size using Prometheus. The pool provided methods to acquire and release connections.
We integrated the pool into the django-ldap backend to acquire and release connections for each ldap (query) operation — NOT at the start and end of a request. To accomplish this, we extended the django-ldapdb backend base.py, which provides the LDAP Django DatabaseWrapper. The DatabaseWrapper implements methods to give a database cursor (connection) to the framework when needed. By extending the _cursor method to acquire a connection from the pool, and the close method to release the connection, we were able to integrate with the LDAP pool. To complete the task we modified the add_, delete_s, modify_s, rename_s, and search_s methods to acquire a connection from the pool and release it when the operation was complete.
Integration With Django
To measure the performance, we ran some simple load tests against the server with pooling and without pooling enabled. The load simply ran a specific number of threads each making 10 GET /user requests. Clearly, the server with pooling didn’t flinch as concurrency went beyond 30 compared to the non-pooling server. The benchmarks were generated running the server and load in docker containers on a laptop, so the actual times are not what we see in production. The important take away from the data was that pooling gave us consistent and acceptable performance as concurrency increased. Good enough to try in production.
We Promised Metrics
In the Austin datacenter, we run 30 middleware docker instances across 3 hosts (s141, s142, s143). Each instance’s pool is configured to use up to 40 LDAP connections. That’s a total of 1200 connections allowed, or 400 per host. Below is a snapshot of our pooling performance. Each vertical in the graph represents a host system, which aggregates the metrics for all 10 instances.
Row 1: Free Pool Space The number of free slots (40 max) for connections. When this is zero, new connections cannot be created on demand. Connections are closed and pool space is reclaimed when a connection has not been used in 5 minutes.
Row 2: Acquired Connections per minute When any query in the middleware is made, a connection is acquired from the pool and released when the query is complete. The graphs below show the number of connections used per minute.
Row 3: Connections Created per minute New connections created per minute. When all connections are in use, the pool manager will create new connections if there is free pool space. Otherwise, the consumer will block until a connection is released.