Going against the grain — Why we chose MySQL for building Social Graph

Published in

Gameskraft

6 min readApr 30, 2021

by Anoosh C Nayak & Arjun Tomar

Social engagement on the platform is a key thing these days for increasing the time a user spends on the platform. About a couple of months back, we decided to build a social network into our games. The product goals were simple to start with.

User Profile (Social attributes — name, status, bio, logo, total friends, followers, and more)
Friends ecosystem (Friend requests lifecycle to blocking friends)
Follow ecosystem (Follow request lifecycle to blocking followers/following)
Groups (User groups on the platform)
Chat ecosystem (Chat requests lifecycle & accompaniments)

We were clear from the beginning that friends of friends & other graph traversals that required multiple hops were not part of the requirements for the foreseeable future. The requirements would mainly be at single-hop ONLY.

After looking at the product requirements, our first intuition was that we need to “Build a Social Network” — We need a Social Graph Service incorporating a Graph database to establish relationships between the users & all the other components that would center around this. But hang on !! We don’t want to build Facebook out of this. All we wanted was, to satisfy the social needs of the product without having to manage a full-blown stack

Choosing datastore

One of the crucial aspects of building this feature was the choice of data store. Choosing the wrong datastore later in the development cycle would cost us dearly, so we ensured to spend sufficient time evaluating the options out there with some specific criteria in mind. Below was our evaluation criteria

Performance: Low latent & high performant
Maintainance: Easy to maintain and in-house expertise to solve any production issues.
Open source: Preferred because of the community support that comes along with it.
Scalable: The DB needs to scale sufficiently with little to no effort.
Extensibility: Not the primary requirement.
Cost: Not a major factor if DB really performs well.

Graph DB

Having had in-house expertise with Neo4j powering our fraud detection engine, the first and obvious choice was to go for it. This was a feature-rich DB for building and maintaining the graph, but we knew its limitations from the start. Few being the lack of scalability (community edition only supports 1 node), reliability (1 node means less reliable), and not so great community support (when things go south). Few of these limitations could be overcome if we shell out extra $ to buy out licenses. Hence we decided to try out alternatives before we make decisions.

Referring to G2Crowd & DB-Engines, we decided to evaluate the following three Graph DBs:

Neo4j
AWS Neptune
ArangoDB

Load testing

Based on the requirements and query patterns, it boiled down to serving the below queries efficiently.

Get User Profile by user ID
GetUserIdsByFriendRelationship: Get user IDs by Relationship (Friend/Follow/Group etc). Example: Get the list of users IDs that are in friend relation
GetUsersByFriendRelationship: Get Users by Relationship. This is the same as the above query but it gets the actual user objects with few given attributes. Example: Get the list of users that are in friend relation along with attributes such as display name, display picture, friend_since, etc
Get relationships between two users: All relationships b/w user ‘A’ & user ‘B’, i.e, say: A & B have the following relationships — friend, follower, chat blocked,

DataSet

For load tests, our data set was about 1 MN unique users with 40 MN relations among them. The relationships were distributed as follow — 5% users have 200 relationships, 10% - 150, 15% - 125, 20% - 90, 30% - 50, 20% - 25.

DB schema

DB configurations

All the DB’s were tuned to production configurations. Also, the AMI’s were tuned for max file descriptors, transparent huge pages, heap size maximized, file system w.r.t DB’s preferred. All the machines were run on r5.xlarge AWS EC2 SSD.

Neo4j: Single master only
ArangoDB: Single master, master-slave and master-master
AWS Neptune: Single master and master-slave

Benchmark-1 : 1K qps / GetUserIdsByFriendRelationship²

p99 latency of GetUserIdsByFriendRelationship query across chosen DB

For a single master setup, ArangoDB & Neptune performed on a similar note with Arango performing better with a p99 Avg of 256ms over Neptune's 609ms. Whereas Neo4j was way high averaging 3 seconds.

Altering Neptune to master-slave & Arango to master-slave, rendered similar results.

With ArangoDB master-master architecture’s Latencies went for a toss, as it had to search every node in the cluster for the relationships( sharded by userId)

Winner so far

At this point, ArangoDB was the clear winner, however, we had one more critical query where we wanted users by relationship. For this, we had the below two approaches. Either get the users directly by relationships from graph DB or get only user IDs from graph DB and do an additional query to fetch users from the obtained user IDs.

Approaches to get the users by relationships

Note: ArangoDB had the added advantage of being a DocumentDB w.r.t its data modeling of the node, i.e, Nodes are a document collection & store varied data types, provide indexes, etc.

For evaluating the above approaches we needed a secondary store that can fetch the object given an ID. Our preferred choice of DB was MySQL & MongoDB given the in-house expertise and maintenance in mind.

Benchmark-2: 1K qps / GetUsers¹

Secondary DB configurations

MySQL was running on an r5.xlarge machine.
Mongo(master-master sharded) on 3 r5.large machines
Arango was running on an r5.xlarge machine.

At this point, we started asking ourselves why not use MySQL or Mongo as our graph database. As our requirements were pretty much basic and did not require complex in-built graph traversal queries. So we did a benchmark of our queries and below were the results.

Comparison of MySQL/Mongo/Arango for all queries

MySQL was the clear winner as MongoDB has to search all nodes for the getUsersByRelationship query.

Further questions:

Do we need multiple hops or graph traversal’s which would not be possible in SQL data modeling?

We were clear from the beginning that we would not require this in the near future.

How often the user attributes would be changing, This would impact running alter tables & other maintenance overhead for SQL (MySQL was used)

This solution would require us to gather enough fields in foreseeable future and have columns added in the schema. If the additional fields are just a meta of the user then we need not add a new column for every field. We can store it in a single column as a stringified JSON or as a blob. However, if we want to add a field based on which we need a search then we have to add a new column.

Final DB Comparision Chart: (Performance Metric is w.r.t our use-case)

Conclusion:

Building a social platform does not always involve using a graph database and one needs to spend enough time analyzing the actual access patterns and workloads for which the features are built. Choosing the wrong DB just because it's new and looks fancy can prove a maintenance nightmare down the line and when that production issue hits, I bet you don’t want to be googling around for a solution.

If you are looking for working on some exciting stuff, please send your resumes to careers@gameskraft.com. We are hiring!!