Choice of DB for supporting 50M+ players in a match at My11Circle

9 min readMar 20, 2023

We at My11Circle have witnessed immense growth in the number of users on our platform. Building and maintaining our platform to successfully support the growth in number of users while also adding in new exciting features has always been challenging. We use a variety of databases for supporting different use cases in our system. One of the main use cases in a fantasy sports app is enabling users to create fantasy teams and join different contests with those teams. We have been using Mysql DB since Day 1 to support this. In order to make our system future proof in terms of handling the scale, we did a re-architecture of our core microservices last year. One of the areas of focus during that exercise was eliminating the bottleneck that Mysql imposed on our system because of its lack of support for horizontal scaling.

Our Requirements

In fantasy sports a large number of players want to join contests with limited seats in a short span of time between toss and match start. For simplicity, this can be understood as ’N’ players need to be allocated ‘M’ seats according to some rules. In order to avoid offloading this concurrency to DB we have a separate system to handle this. The output of this system would be the allocation(Player-Seat) which needs to be persisted in the database. So the expectations we had from DB were as follows:

Throughput

Support for peak write throughput of up to 400K writes per second and read throughput of about 250K QPS.

Latency

99th percentile latency under 15 ms.

Strong consistency

For some use cases we need to be sure at a particular moment whether a transaction is successful, so we need support for strong read consistency.

Multi attribute query

For example, ability to know which contests a player is playing in as well as which players are participating in a particular contest. In Mysql we can add indexes for various columns and achieve this.

Scalability

We receive high traffic only during specific matches and events like IPL. It was desirable to scale out and scale in DB infra to be cost efficient.

Based on these requirements we evaluated various DB systems available

Dynamo based DBs:

Since the system is write intensive, we first evaluated dynamo based DB’s such as Cassandra, ScyllaDB and Amazon DynamoDB. ScyllaDB is a DB which offers drop-in replacement for Cassandra and claims to be a more performant implementation while maintaining compatibility with Cassandra API. By looking at the available benchmarks for these DBs all of them seem to offer the performance to meet our needs. An important consideration while modeling your schema for these DB’s is the choice of partition key. Queries without specifying partition keys are non-performant and can also impact the performance of the entire cluster. Let us look at the various options we have if we want to query data without specifying partition keys.

Redundancy: As a best practice for modeling your data for these DBs, if you want to query your data based on multiple columns you can have multiple copies of your data with different primary keys. This is because these DBs are very efficient with writes. The caveat is that you would have to maintain the consistency of data at application level. Managing consistency yourself is not that straightforward, hence we avoided this approach.
Materialized views: This is basically DB handling the redundancy for you. As you insert/update data DB replicates the changes or updates some metadata asynchronously depending on implementation. The dealbreaker for us here was lack of strong consistency. All DBs that offered this functionality either don’t offer strong consistency or if they did write performance suffered immensely.
Global secondary index: Basic principle for implementing this is similar to materialized views. We tried running a performance test using this with ScyllaDB which showed poor performance in write throughput.

A hack that we tried(and why you shouldn’t):

Lets understand this with an example that we referred to earlier in which we need to support two queries:

Q1 Given (player_id, contest_id) , find the details with this association.

Q2 Find which players are participating in a particular contest.

To support these queries you can create a table like above with a primary key (contest_id, player_id). But what if there is a contest with a large number of players? Since contest_id is the partition key, all the rows for a particular contest will be stored on one node which can become hot. To avoid this we thought what if we divide the data for a contest into a fixed number of buckets(let’s say 1000). Then table would look like this

Primary Key = (bucket_id, contest_id, player_id). bucket_id is a function of player_id i.e. bucket_id = player_id%1000. Now how can we support both queries Q1 and Q2 with this?

For Q1: We have player_id and contest_id with us. bucket_id can be computed from player_id and data can be queried.

For Q2: We can do 1000 separate select queries for each bucket specifying our contest_id. Since the data for a particular contest is divided across buckets in different nodes, this avoids the problem of a node becoming hot.

What is the issue with this approach?

In Cassandra and Scylla, partition is the basic unit of data storage. Data for a table is divided into partitions which are stored across different nodes. When we query for any data, we must specify the partition key from which the partition that holds the data is determined.

Large partitions:

In Cassandra, partitions are basic units of data which are loaded from or flushed to disk when data is retrieved. So if your partition size is large(a few hundred MB’s) your read performance will take a big hit. Therefore it is recommended to keep the partition size lean. If we use the primary key(bucket_id, contest_id, player_id), bucket_id will be the partition_key which means we will have a limited number of partitions which will lead to performance problems. Recent versions of ScyllaDB have implemented a different mechanism to tackle this problem.

Updates:

Even if your database implementation can handle large partitions there is one more thing you need to keep in mind which is frequency of updates. All data on disk is stored in immutable files called SSTables. Each row in the SSTable isn’t necessarily a full row of data. Rather, it is just a mutation, a list of changed (added or deleted) columns and their new values (or “tombstone” for a deleted column), and a timestamp for each such change (this timestamp is used for reconciling conflicting mutations). The full data row needed by a request will be composed from potentially multiple sstables and/or the in-memory table(s).

If we have frequent updates in our partition, our partition data will spread across multiple SSTables. On trying to retrieve data belonging to that partition multiple SSTables must be read which will put a lot of pressure on disk IO. We observed the similar behavior during a load test using ScyllaDB. SSTable reads would queue up leading to high read latency(4–5 seconds).

For these reasons we discarded Cassandra, Scylla, DynamoDB for our use case.

Aerospike

We went through the docs for Aerospike DB and found out that it checked most of the boxes for our requirements. Aerospike is a proprietary DB and has a licensing cost. We received high traffic only on certain tournaments and seasons and therefore we didn’t find it cost effective for our use case and continued exploring other options.

Vitess

Vitess itself is not a database. It is a solution that provides clustering and sharding capabilities on top of Mysql. It was originally developed at Youtube and now its a CNCF graduated project also having an enterprise version. It has three main components:

VTGate: It is a stateless proxy between Mysql and your application. It is aware of your schema and cluster topology and routes the query to corresponding mysql instances.
VTTablet: It is a daemon process that runs beside your mysql process. It is responsible for initializing your mysql instance and acts as an interface for communication between mysql and VTGate.
VTAdmin: Vitess stores all the configuration and topology info in a key-value store such as Consul and Zookeeper. VTAdmin provides a CLI and a web interface to manage configuration.

Img Src: https://vitess.io/docs/15.0/overview/architecture/

Sharding

It offers multiple sharding techniques. Let’s suppose we have configured vitess with 3 shards. We can configure one of the columns in the table like player_id to be used for sharding. Now if we insert a row, VTGate will first create a hash of player_id value and then determine the shard from the hash and execute the write in the corresponding shard.

Resharding

Let’s say we are running 1 shard and we need to increase the no. of shards to 3 to handle more traffic. The way resharding works in Vitess is you simply can’t add two new nodes in a cluster and expect it to rebalance. In order to do so we need to create 3 new shards and initiate a resharding process which streams the data from previous shard evenly into new shards. Previous shards can then be discarded.

Other Benefits

Vitess not only adds sharding capability, it turbo-charges your Mysql with many features. Some of them which were of interest to us are:

Managed schema migrations: It comes with its own VReplication based implementation as well as also integrates with other tools like gh-ost to allow schema changes without downtime.
Materialized views: It supports MV that include aggregation expressions in which case, Vitess will create a rolled up version of the source table which can be used for real time analytics.
Managed switchover: Since client application does not directly communicate with Mysql and do it via VTGate, this allows for seamless switchover between master and slave for maintenance exercise or for implementing a HA Mysql setup.
Observability: All the Vitess components expose endpoints with relevant metrics which can be scrapped using prometheus. There are also debug UI’s that expose detailed query level stats.

Performance

Performance in most cases would be the same as you would get from your Mysql in addition to around 1–5 ms of additional added latency due to introduction of VTGate and not directly communicating with Mysql. We confirmed the same by running load tests for our scenario.

Why did we choose Vitess?

Performance: It offered a predictable performance similar to our existing Mysql setup. We chose hash based sharding using the player_id column which allowed us to scale almost linearly by increasing the number of shards.

Cost Effective: We are not only able to be cost effective by changing the number of shards, we can also easily do vertical scaling using managed switch-over. Also since our application connects to VTGate instead of Mysql, it gives the ability to handle a large number of Mysql connections by only scaling VTGate. We were able to be even more cost efficient by integrating Vitess scaling with our inbuilt automated infra scaling platform using ML.

No client side changes: We did not have to do any code changes on clients as the application is not aware of Vitess being present in the system. It communicates with VTGate using standard mysql protocol.

Quick Adoption: Since only a few components are added on top of Mysql, which everyone in the team was familiar with, it was easy for developers, DBA’s and Devops to learn about Vitess and migrate to it in a relatively short period of time.

Our Vitess setup on Production

Vitess offers a kubernetes operator that allows you to be up and running on kubernetes in no time. We deploy Vitess on EC2 machines as shown below.

As a general practice we regularly archive data from Mysql to another data store. This is to keep the size of DB handling transactions lean for maximum performance.

When should you consider using Vitess?

If you are struggling with the scaling capabilities of Mysql, you should definitely consider evaluating Vitess before jumping in the NoSQL realm. If it turns out to be suitable and meets your needs it can save you a lot of effort and time. If you have some doubts you can directly post it in Vitess slack channel.

About the author

Sulabh Kumar is a backend engineer at Games24x7. He has been in the software space for around 5 years, mostly working on distributed systems running at high scale and concurrency.

Find him on LinkedIn here : https://www.linkedin.com/in/sulabh-kumar-94757b12a/