How we scaled our MySQL to handle 3x customers

Published in

Hevo Data Engineering

6 min readFeb 12, 2021

At Hevo, we use MySQL as our primary datastore to store a broad array of data starting from critical user data like pipeline settings, encrypted source/destination configurations to tracking data like task manager metadata, metadata around files to be uploaded to the destination, to even debugging data like user activity logs. MySQL was never a bottleneck at the outset, as the data stored was predominantly metadata and the number of customers was also lesser. As the number of customers increased manifold in the last few months, we started seeing CPU spikes, an increase in the number of connections, and subsequent database/application slowdowns.

We went through some of the application flows, which were contributing majorly to this, and made some optimizations.

Added/modified indexes for tables that were causing slow queries.
Revamped some of the backend flows to reduce the number of queries made to the database. For example, caching the data which is queried frequently, batching write requests, etc.

Even after doing these minor tweaks, there were still two major bottlenecks that we were seeing w.r.t the database queries.

Slow “tracking” table writes.
Expensive bulk deletes.

Slow tracking table writes

When we say “tracking” tables, we refer to tables whose records have a lifecycle or a state. For instance, an order entity in an e-commerce domain, which goes through different states in its life cycle(CREATED, DISPATCHED, SHIPPED, DELIVERED, etc) and eventually reaches a terminal state. After it reaches the terminal state, it will be queried much less frequently or mostly not queried at all.

We have some of our bigger tables which fall in this category. For instance, we have tables where we store the metadata corresponding to files that are written locally and gets eventually uploaded to a remote object store(S3/GCS) so that it can be tracked, recovered in case of failures, and queried on later. As the number of customers increased, the size of the tables also started to increase exponentially, which was slowing down writes considerably. The cleanup of records from these tables was also becoming a huge bottleneck as the table size increased.

Also, in most of these tables, we realized that the major chunk of data processing happens “locally to a node” and the tracking of these entries does not have to be in a remote store like MySQL. In other cases, the records which are getting used by a distributed workflow were much lesser compared to the total number of records generated and can be uploaded to a remote store on demand (e.g.: backup files for data recovery). That's when we decided that these records can actually be stored in an embedded datastore instead of a remote store and can be deleted from the embedded store as and when they reach their terminal state.

RocksDB as the embedded store

We evaluated different embedded databases and RocksDB emerged as a clear winner among all available key-value stores considering various factors like massive community support and the read-write-delete latencies as per the various benchmarks published globally. Eventually, the decision had to be taken between using a key-store or a relational embedded store like SQLite. We decided to go with RocksDB as we did not really need strict ACID guarantees for these tables, nor did they require advanced indexing. Also, SQLite does not perform well in multi-threaded workloads. RocksDB ticked all the boxes for our specific use case.

Optimized for multi-threaded workloads.
Efficient point lookup and range scans.
Provides high performant prefix scans using bloom filters, which can be used to support secondary indexes, in case that’s required.

RocksDB uses LSM trees under the hood and gave us sub-millisecond latencies on read-write-deletes and we were able to remove some of the bigger tracking tables by using RocksDB.

Expensive bulk deletes

We have an auxiliary service called service-keeper, which runs periodically cleaning up older entries based on time or entries which have reached a terminal state from the above-mentioned tracking tables. Using RocksDB, we were able to control the sizes of some of the bigger tracking tables. But we still had huge tables like user activity logs, table history snapshots, etc, which were mostly append-only tables and the records were deleted purely based on time. for instance, we wanted to display the user activity logs for only 15 days, and hence the data beyond that can be deleted. These deletes were getting really expensive for some of these tables, to the point that it was sometimes bringing the database server down when the service-keeper runs.

Database load when the service keeper runs

The daily spikes in the above graph are attributed to the service-keeper runs. It was touching around 20 average active sessions(AAS) on weekdays.

How we solved it

We created time-based partitions for some of the bigger tables and started dropping the partitions altogether instead of deleting rows from the tables. Dropping a partition does not consume any database resources as it just deletes a directory under the hood. We build our own Partition Manager to monitor existing partitions, create reserve partitions for the future, and also drop partitions older than a configured retention period.

Can all tables be partitioned by time?

No, all tables cannot be blindly partitioned by time. Queries on a partitioned table will have to scan through multiple partitions and multiple indexes and hence the query latency is much higher on a partitioned table compared to a non-partitioned table. For instance, if we configure the Partition Manager to create a partition daily and we want to retain the data for 15 days, the queries which do not have the partitioned timestamp in the where clause will have to scan through 15 partitions. The tables which are queried less frequently become the ideal candidates for table partitioning.

With this solution in place along with using RocksDB for tracking tables and few other optimizations, the database load on service-keeper runs reduced drastically, to less than 6 average active sessions.

What's next?

Some of the tracking tables, which are queried frequently can also be moved to the time partitioned mode if we have a way to cross off partitions, which are logically expired when all the entries of the partition have reached a terminal state and query only the active ones.
As the number of customers increases further, we might move to a “distributed relational database”, which shards the data across nodes based on a tenant id, while still providing ACID guarantees within a particular tenant.