If you check MySQL Customers the list includes Facebook and Twitter too in the Web: Social Network section. When I read the same I wondered about the scaling issues in MySQL and how tech giants like Twitter and Facebook did it for themselves.
If you are not familiar with the term scaling, let me get you a brief idea before getting started with process.
In simple words scaling is making your database handle more traffic or load for read and write queries.
Methods of scaling
The process and method of scaling is categorised into below major types
Vertical Scaling or Scale-up
In Vertical Scaling, we increase the CPU, RAM and Storage or buy a more robust server in order to support more traffic and load on our database server.
- Less power consumption than running multiple servers
- Cooling costs are less than scaling horizontally
- Generally less challenges to implement
- Less licensing costs (only if using paid databases)
- It’s way more costly
- There is a limit to increasing hardware on a single system
- Greater risk of hardware failure causing bigger outages
Horizontal Scaling or Scaling-out
In Horizontal Scaling we add more systems with the smaller configuration in terms of CPU, RAM and Storage to distribute the traffic or load across these systems.
- Much cheaper than scaling vertically
- Easier to run fault-tolerance
- Easy to upgrade
- Bigger footprint on data centre
- Higher utility cost
- More licensing fees (only if using paid databases)
When we use these Vertical and Horizontal Scaling at the same time we call it the Hybrid Scaling approach.
Scaling MySQL database
By default, the MySQL can be scale either using Vertical or Hybrid approaches but not fully Horizontal approach.
Generally, we tend to create a Master-Slave architecture and route all the write queries on the Master instance and all the read queries on the slave instance which are replicated from the Master. We can have multiple Slave instances running at one and scale our read operations horizontally. But the Master can only be scale Vertically.
Problems with the Master-Slave approach
Every record inserted in MySQL tables have one primary key to index the records for faster select operations. Generally, we have mysql_insert_id as our primary key on the table and having multiple masters this key will be duplicated across the databases and the data will be redundant to use.
In order to solve the duplicate ID issue, we can have one ID generator which generates the unique IDs and we can use the same ID as our primary key, but this will add one extra point of failure in the system and also very prone to fail in concurrency. If your ID generator is down or slow, the whole database will fail or become slow.
The Solutions and Approaches
When I have faced the same issues with MySQL in one of my system, I started researching in the same direction to figure out the best possible approach to solve the multiple master issue in MySQL with almost 0 tradeoffs in terms of extra time, my research was focused to solve the below issues
- Multiple masters in MySQL to scale my write operations horizontally
- Handle concurrent requests with no duplicate IDs
- No extra waiting time added to the write operations
To solve the duplicate ID issue in multiple master architecture I have used one ID generator server which generates the unique time-based serial hashes which can be used as a primary key in MySQL table. The benefit of having time based serial hashes is the time relation between to two IDs are possible to measure.
To solve the concurrency issue have used a pool of IDs stored in a cache generated by ID Generator in an Asynchronous manner. One a write request comes to the master it takes one ID from the cache and inserts the same as a primary key.
I have a small cache on every master instance stored almost 1,000,000 IDs to be consumed by the master, If the IDs stored in the cache reaches to a certain limit it makes an Asynchronous request to fill up the cache with new IDs.
Approximately 0 extra time
As these requests to the ID generator are Asynchronous so there is no Extra time tradeoff on the write requests except the time taken in IO within the same server cache which was negligible. Check this out for more info.
So there are ways by which we can achieve Horizontal scaling in MySQL too we just need to find the right approach suites our application best. There are some tradeoffs in every possible scaling approach but when our application and users demand it we must do it by carefully measuring these tradeoffs to see where we can take the leverage.
If performance is the only concern then you should also consider the followings on the database and application level
InnoDB Buffer Pool Size
- Holds the data and indexes of tables in memory.
- Bigger buffer results in faster row lookups.
- The bigger the better, the default is 8Mb
- Use InnoDB where it requires
- Keeps the result of queries in memory until they are invalidated by writes.
- query_cache_size — total size of memory available to query caching, the default is 128Mb
- query_cache_limit — the maximum number of kilobytes one query may be in order to be cached, the default is 8Mb
- If you write lousy queries, you will get lousy performance no matter how much you scale.
- Use EXPLAIN to profile the query execution plan
- Use DISTINCT not GROUP BY
- Never use an indexed column with a function
- Avoid using functions in WHERE clause
Use Solid State Drives (SSD) for database servers as they are better for latency and access time than regular HDD, a bit costly but can achieve 7x read and write performance on IO.
Hopefully! now you will be able to scale your MySQL based applications, let me know your approaches in the comments and don’t forget to share this with your friends.