With tens of millions of members and billions of profiles viewed every day, Tinder already operates at scale. We expected to exceed our normal traffic levels with the launch of Swipe Night, an interactive, apocalyptic-themed in-app video experience. The first episode was launched in the United States on Sunday, October 6th between 6pm and midnight local time. Normally we launch features with a gradual rollout by first releasing to a smaller market. However, this was not an option given the secrecy and the desire for the experience to be immediately available during the time window.
We planned to send multiple rounds of push notifications to increase awareness of the experience, which we know can help to cause spikes in traffic. Most components of the Tinder backend use an autoscaler; however, the autoscalers are reactionary– they will scale up after an increase in traffic in preparation for the potential need to sustain that load. They best respond to gradual changes in traffic, rather than the spike expected to follow a nationwide push notification announcing new content. With the combination of a high traffic day of the week, an expectation of wildly irregular traffic patterns, and the release of a high profile feature, some might call this the ultimate load test.
The intent to drive so many people into the app simultaneously raised the question — could Tinder handle the traffic? Initially, we estimated an average of 1.2x normal Sunday traffic based partly on historical information and partly on intuition. It was our best guess, but it didn’t account for the large and instantaneous spikes created by push notifications. We needed assurance that this release would not take down the app, so we decided to conduct load tests to gain confidence in being able to scale well above the expectation. Our goal was to be able to handle 3x the current load.
We started with the services called upon app open, where the majority of the load would be felt. We reached out to each backend team responsible for these services to review how each of them worked, learn the best ways to test them, and understand any potential scaling issues they may have. To create a more accurate testing environment, we set up one of the staging environments to be at 1/30 the scale of production, which includes both Kubernetes and Dynamo provisioning. One key difference between the environments is that the production environment uses Dynamo autoscalers while the testing environment does not; while this interfered with the system being able to scale itself, it would help expose scaling vulnerabilities and any problems caused by spikes in traffic.
Once the testing environment was properly set up, the next step was to provide a means of producing the desired level of traffic. We needed both a way to create hundreds of accounts and an automated login process. To accomplish this, we created this test environment with the additional capability of automated account creation. We then wrote a series of JMeter scripts to log these accounts in and then simulate the behavior for the desired test. Not only did we create scripts for existing services, but we also needed to target the public routes for Swipe Night, following the expected traffic pattern. Once the testing environment was set up and a method to generate the traffic was established, we were ready to begin testing.
Our first objective was to individually load test the major services. At first, these scripts were run locally; however, the amount of traffic we could generate was limited, so we decided to run these scripts in Kubernetes. We spun up a pod and sized it up a few times to reach the desired level of CPU. When testing, we paid attention to the first thing that broke. Were Dynamo tables throttling? Were there socket hang-ups? We addressed problems individually as they arose and reran the tests until we were able to sustain 3x traffic — it was an iterative process. By the end, we had a list of items to address in production as the result of each test.
Although individually testing public modules provided a good idea on how each service would perform, there are many downstream services that get called by more than one of these public modules. These downstream services, therefore, were not being tested with the aggregate amount of expected traffic — only with a fraction. Consequently, we conducted downstream tests on the most heavily used services with the same 3x traffic goal.
We conducted multiple rounds of testing and reached out to the engineering owners after each round to share the results. We discussed potential improvements and began to take action, which included adding several caches.
While JMeter scripts were a successful means of load testing a significant chunk of the Tinder infrastructure, they were not needed for a few teams that were already conducting regular performance testing utilizing production traffic. For example, when iterating on the matching algorithm, the recommendations team will fork a small percentage of requests to a scaled-down ElasticSearch cluster to test the performance. In order to test our goal of 3x traffic, they simply increased the percentage of traffic being sent to ElasticSearch and validated that they were able to handle traffic well above our goal.
Once we had tested existing public services and select downstream services and infrastructure, it was time to test the new additions created to support Swipe Night. Given that this was a new experience with no historic traffic patterns, the expected level of traffic was unclear. We made estimates based on the expected number of members on the app on that Sunday and estimated level of participation.
Testing the new services in a scaled-down environment helped us determine how to provision CPU and memory for the Kubernetes pods. However, we felt that affected downstream infrastructure required additional testing. When a member made decisions or reached the end of an episode of Swipe Night, the public module would write to a Dynamo table. A Dynamo stream worker then would pick up changes and send them through Kafka to be consumed downstream, triggering updates to ElasticSearch. Our top concern was the latency to update ElasticSearch and therefore not being able to satisfy the product requirement of the member being able to receive relevant recommendations of other Swipe Night participants upon completion of the experience. It was critical for us to understand the limitations of the system, as well as having the ability to speed it up if needed, so we decided to leverage production traffic to test.
We made a series of non-blocking fire and forget requests from one of the services called upon app open; these calls represented the member progressing through the interactive experience. For each member, there were 12 calls made over a 3-minute period. We also added the required metrics to be able to understand the latency. Given that we were forking traffic from a major service, there was a concern that keeping threads of execution open longer could impact performance and potentially take down the service. For that reason, we included a kill switch to allow us to end the process immediately if we saw degradation.
Because we were leveraging production traffic, we had the ability to significantly dial-up traffic with the click of a button. For the first test, we ramped up slowly, starting with 1% of traffic triggering the calls, then 10%, 25%, and exponentially growing until we reached 100%. We closely watched the latency for the 50th, 90th, and 99th percentiles, noting the point they became unacceptable. We learned that we could handle our expected level of traffic, but that at 4x traffic, we should expect a slowdown and be prepared to scale up those systems.
After running individual tests of existing and new, public and downstream services and making the improvements and pre-scaling necessary to handle 3x traffic, we were confident that Swipe Night would go smoothly, but we wanted to eliminate any remaining doubt. To conduct one final sanity check, particularly with a spike in traffic, we sent a push notification to all members in the United States who had such notifications enabled over a 5-minute period. In order to monitor and respond to any problems, we gathered engineers across the company, including from the Operations, Engagement, Identity, and Connections teams. We noticed a few Dynamo tables went beyond the burst capacity and started throttling. It made us aware that our Dynamo autoscaler, which took 10 minutes to react to consumed read and write units, should be sped up. Overall, the test was a success, we had a tangible grasp on what services to pay attention to during launch, and we improved upon a major component of our backend.
Upon Swipe Night release, the traffic pattern was as expected, with spikes boosted by push notifications. Traffic greatly exceeded our initial expectations, reinforcing the need for load testing at a higher level. The launch was smooth, given that we were prepared.
Load testing provided value for the Swipe Night release and Tinder in general. We were able to launch with confidence, existing components of our backend were improved, resource allocation was reevaluated, and we now have a greater degree of automation in our load testing infrastructure.
Interested in joining Tinder? Click to explore open opportunities.