Architecting Databases for High Availability
Today is the age of the cloud, automation, fault tolerance and high availability. If an Ansible playbook can do the job, you do not want expending human resources on a manual and repeatable task.
This similar philosophy applies to databases as well. Gone are the days, where one would have a database and then several human DBAs to manage it, along with support and operations on call 24*7. We have not completely gotten rid of the human element yet, but have successfully reduced the role it plays in managing databases. Architecting your database to be fault tolerant, self healing and highly available plays a major component in it. The exact steps might differ from database to database, but this post hopefully gives you the general idea for point of failures to take into account, and the possible technologies which can help you achieve your goal.
Step One: Create thy Database
First step to building your highly available database, is the existence of a database. You can host this on bare metal, the cloud, etc. The closer to your application, the better. Network latency can play a huge role in your read/writes time. Since this is a one node set up, you will perform both your reads/writes to the same address.
The point of failure here is obvious, if your database crashes, or there is a hardware or network failure, you cannot read from your database nor write to it.
Step Two: Let’s add a Replica or two
Depending on the type of database/use case of your application, it is always wise to have at least one replica. There are several reasons for it:
(1) It is a hot backup. You could have hourly/daily manual/automated backups set up for your database. But it is never the most up-to-date version of your data. There is always going to be some data loss. Replication is a great way to have a hot backup, because it is tries to maintain a latest copy of the information on the master. Different databases do replication differently and also allow you to configure it according to your needs.
For example, MySQL 5.6 allows you to do async, sync, semi-sync replication. “Sync” replication ensures that the data is written to both the master and the replica before the transaction is complete. “Async” replication writes it to the master and completes the transaction and the copying of relay logs to the replica continues at its own pace. “Semi-sync” replication is a balance where it ensures that the relay log has reached the replica, but does not worry about the execution of it before it completes the transaction. If you are familiar with CAP theorem, this is a perfect example of that. Depending on what is more important for your application — data consistency or faster writes, you can pick the option.
(2)It allows you to split your reads and writes. This is useful for some applications which may be read heavy or want to maintain a copy of data for the analytics team to use.
It is advisable to place your master and replica on separate hosts or racks or aggregates. It helps reduce point of failure. In case one host goes down, you still have a copy of the data on the replica which you can convert to a master. In case you want additional redundancy, or believe in the Law of Threes, you can have an architecture with two replicas. That way if a node fails, you have two nodes — one master and one replica always available.
Connecting to the data:
(1) The user could have separate connection strings to master and individual replicas and in case a failure occurs, manually change the connection strings in the application configs/code.
(2) The user could use DNS strings which apply to a generic master and replica. And in case of failure, change the DNS string to point to the new master/replica. The upside is you do not have to change anything in your application code. But the downside is DNS propagation takes time and you do not have access to your data for that undetermined time period.
The point of failures include time taken to detect that your master is down, time taken to update your application code once failure is detected, or DNS update propagation time.
Step Three: Lets monitor replication
Depending on the database technology you have chosen, you might be lucky to have some supported tools which monitor replication health of your master and replicas. Most of these tools also do automated promotion of replicas to master once a failure is detected. Some of examples of these tools include Sentinel for Redis, MHA for MySQL etc. These tools mostly have a similar check mechanism where they check whether they can access the database. And if they cannot access the master, they try “x” times in interval of “y” seconds before they declare it a failure scenario. “x” and “y” are always configurable and are dependent on your network latency and setup.
They then promote a healthy replica to master and point the other replicas to this new master and begin monitoring the new set up. Some of the databases have several tools which perform this operation. Look for the ones which are well supported in the open source community.
These tools also generally come with post hooks, which you can point to a script to alert you of a failover, or even automatically update your application config or DNS. It is also recommended to run your monitor tool in quorum or using some tool/technology which ensures its uptime like supervisord or systemd, and informs you if it is not running.
The point of failures have been now narrowed down to time taken to update DNS/application.
Step Four: Reduce the time
In most scenarios, it is unacceptable to be unable to predict how long a failover takes and couple of minutes is too long and too vague. This is where I would encourage you to look at other technologies which can assist you in your goal. A good example is using HAProxy. Your application could connect to HAProxy front end, and the HAProxy backend could connect to your master and replicas. You could utilize different ports in the front end to connect to different backends to split reads and writes. Any failover post hooks could update your backend configs in HAProxy and trigger a restart or reload, depending on how your application driver does connection handling. The downtime in this scenario is basically time taken to detect failover and inform HAProxy about it, which is far more predictable than DNS propagation. I am a personal fan of HAProxy cause it understands Layer 4 protocol (TCP), but you could choose any other load balancing/proxy technology your team has expertise in.
The point of failure, is the single proxy node in this case.
Step 5: One is never enough
Like in the case of databases, and your monitoring tools, one is not enough when it comes to your proxy node as well. You need to have a backup plan. It could be a live node with the same config managed by KeepAlived and VIP(virtual IP) or could be something which is spun up on demand as soon as your proxy node dies and is populated with the configs using consul or some other config management tool or API.
The point of failure here could be network partition and law of threes can help you balance that in such a situation.
Hopefully this post gives you a starting point to think about architecting your database beyond your native database technology. No matter what technologies or architecture you design, being aware of points of failure and having monitoring and alerts in place to detect them is crucial.
This post, is an expert from my talk on “Redis in a Multi-tenant Environment” at Redis Conf 2016. Slides — http://www.slideshare.net/itsmeduh/redis-high-availability-and-fault-tolerance-in-a-multitenant-environment
Originally published at http://www.icchasethi.com/architecting-databases/