MariaDB Galera Cluster 10.1 Installation on DigitalOcean Ubuntu 14.04
When dealing with applications on the cloud, it is often best to have replications and redundancies in place. For relational databases, replication in a cluster allows data to be transferred between cluster members (server nodes) automatically so that the applications that depend on these databases will still be functional should part of the servers go offline for whatever reason.
A simple master-slave replication is most common in the SQL world. This allows you to use one master server to handle all of the application writes, while multiple slave servers are used for data retrieval (reads).
While master-slave replication is useful, it is not as flexible as master-master replication. In a master-master configuration, each node is able to allow both reads and writes to be distributed throughout the cluster.
Starting from MariaDB version 10.1, the popular MySQL fork comes packaged with Galera Cluster, allowing synchronous master-master replication.
This guide covers how to setup master-master data replication across an arbitrary number of nodes deployed on servers running Ubuntu 14.04.3 LTS. Once master-master data replication is configured, our applications will be able to connect to MySQL databases on any of our nodes.
This guide will not cover the configuration of load balancing software like HAProxy which can take our cluster one step further by providing access to our cluster nodes under one IP / URL endpoint (i.e. setting Wordpress’s MySQL address to my-loadbalancer-name.bbi.io which traffic load across any one of our database nodes that are currently online).
In this guide, we will be creating droplets based on the Ubuntu 14.04.3 LTS x64 image. We will also be using DigitalOcean’s snapshots services to reduce the amount of installations we will need to run during our cluster node deployments.
MariaDB 10.1 is the current stable (GA) release of MariaDB. It is built on MariaDB 10.0 with features from MySQL 5.6 & 5.7, as well as new features not found anywhere else. New in MariaDB 10.1, the Galera Cluster packages used for clustering have been combined into the core installation. All the packages required for clustering will be installed by default but will remain dormant until configured.
To achieve high availability architecture, it is a good idea to create droplets (servers) across multiple datacenter regions so that our data and services can survive even during datacentre-wide hardware level outages as well as natural disasters like earthquakes in San Francisco.
For server hardware, each node requires at a minimum:
- 1GHz single core CPU
- 512MB RAM
- 100 Mbps network connectivity
Preparing the Server
We will be installing and configuring MariaDB Galera Cluster on one droplet and duplicating it across different regions via the DigitalOcean’s snapshot feature.
Create a new droplet
In DigitalOcean’s control panel, start a new droplet using the Ubuntu 14.04.3 x64 image.
Note: It is a good practice to have the backup option enabled.
Important: It is good practice to give your server a descriptive host name like mariadb-sg01 to make it easier for you to reference different servers when your cluster becomes large and spanning across multiple datacenter regions. In this tutorial, the servers will be named using the convention of mariadb-[REGION][COUNT]. For example: mariadb-sg01, mariadb-sg02, mariadb-sf01, mariadb-ny01, mariadb-ny02…
Once the machine is ready, access the machine via SSH.
Installing MariaDB Galera
Adding the MariaDB Repositories
The MariaDB and Galera packages are not available in the default Ubuntu repositories. However, the MariaDB project maintains its own repositories for Ubuntu that contain all of the packages that we need.
Let’s first install the python-software-properties package which will give us the commands we need to administer our additional repositories.
sudo apt-get update
sudo apt-get install python-software-properties
Now we can add the key files for the official MariaDB repository:
sudo apt-key adv --recv-keys --keyserver hkp://keyserver.ubuntu.com:80 0xcbcb082a1bb943db
This will accept the key file. Now that we have the trusted key in the database, we can add the actual repository for MariaDB 10.1:
sudo add-apt-repository 'deb [arch=amd64,i386] http://mariadb.biz.net.id/repo/10.1/ubuntu trusty main'
Installing MariaDB 10.1
We can now install MariaDB 10.1 through the apt interface. We will have to update the apt repository first for MariaDB to be available:
sudo apt-get update
sudo apt-get install mariadb-server
Important: During setup, you will be prompted to enter a password for MariaDB’s root user. Make note of the password as you will need it to access your database.
MariaDB Galera Cluster Configuration
Now that we have installed MariaDB on our server, we can begin the configuration.
The default config file used by the MariaDB is located at /etc/mysql/my.cnf. It allows additional settings to be defined in .cnf files inside the /etc/mysql/conf.d/ directory. We will create a file at /etc/mysql/conf.d/cluster.cnf to define settings required by our cluster:
sudo nano /etc/mysql/conf.d/cluster.cnf
Copy and paste the following configuration into the file:
[mysqld]# Cluster node configurations
innodb_buffer_pool_size=400M# Mandatory settings to enable Galera
#bind-address=0.0.0.0# Galera synchronisation configuration
Let’s look at the configuration file. The config file is broken into three section. The Cluster node configurations contains the 2 variables we will have to define based on our cluster setup. The Mandatory settings are the settings MariaDB required to have Galera Cluster plugins enabled. The Galera synchronisation configuration section control the method used by our cluster to control data synchronisation. For simplicity, we will only cover the settings of the Cluster node configurations settings in this tutorial. It only contains three variable wsrep_cluster_address, wsrep_node_address and innodb_buffer_pool_size.
wsrep_cluster_address contains a list of address for nodes in our cluster. It tells a cluster node where to find other cluster members. Once defined, MariaDB will try to join the cluster upon startup.
Note: While it is a good idea to have the addresses of ALL our nodes defined here, in reality, our cluster node will only need to connect to one cluster member in this list. Upon joining a cluster, a node will discover and learn the addresses of all other nodes that form the cluster. Hence, variable wsrep_cluster_address does not need to be identical on all nodes, it’s just a best practice because on restart the node will try all other nodes in that list and look for any that are currently up and running the cluster.
wsrep_node_address defines the IP address and port of the node. By default, the node pulls the address of the first network interface on your system.
Important: While the offical document mentioned that this variable is optional and should be auto populated in most cases, DigitalOcean droplets require this to be set explicitly in each of the cluster node. Because of this, while we can install and configure MariaDB Galera in one server and duplicate it using DigitalOcean image, we are still required to have to manually login to each cluster node to modify this settings. A separate tutorial on setting up the Maria Galera Cluster as Docker is in the work to future simply this.
Innodb Buffer Pool is by far the most important option for Innodb Performance and it must be set correctly. MariaDB recommends setting it to around 80% of the total memory available on a dedicated database server. For a 512MB droplet, it roughly translates to 400M.
Note: If set to 2 GB or more, you will probably want to adjust innodb_buffer_pool_instances as well. See the XtraDB/InnoDB Buffer Pool for more on setting this variable.
Deploying Cluster Nodes
We have so far created a droplet running Ubuntu 14.04.3 and have MariaDB Galera Cluster installed and configured on it. It is now time for us to duplicate and expand our cluster (since a cluster of 1 is not really a cluster)!
Minimal cluster size
In order to avoid a split-brain condition, MariaDB’s online document recommends a minimum of 3 nodes in a cluster.
Snapshot and Deploy
- In DigitalOcean’s Control Panel, power off the droplet we have created.
2. Create a snapshot and call it mariadb-galera-ubuntu14.04 (you can call it whatever you want).
3. Go grab some coffee, this will take around a minute…
4. Our first droplet will be automatically powered back on when the snapshot is completed.
5. Go to the “Images” section in DigitalOcean’s Control Panel.
6. Distribute the newly created image across different datacentre regions. This will make our new snapshot image available for use in different regions.
7. Deploy new droplets from image in different regions.
Note: DigitalOcean provide a really smart widget should you required to deploy multiple droplets of the same image in the same region:
SSH into each of the new nodes to update the wsrep_node_address in the /etc/mysql/conf.d/cluster.cnf to the IP / domain it was assigned. For example:
Kickstarting the Cluster
Bootstrapping the First Node
By default, all the nodes we have deployed is configured to try and connect to other cluster nodes upon startup to form a cluster. They will hang and fail if they couldn’t find anything to connect to. This is in fact the current the state of all our nodes right now. You can check the status of MariaDB by calling service mysql status and it should return with * MariaDB is stopped.
To start off a new cluster, a special command is required to bootstrap MariDB on one of our nodes so that other nodes can connect to it. SSH into one of the nodes, e.g. mariadb-sg01.bbi.io, kill off the existing mysql process and relaunch it with the special bootstrap command:
pkill -9 mysqld
service mysql bootstrap
This will bootstrap MariaDB on mariadb-sg01.bbi.io, making it ready for other nodes to join in and form a cluster.
Forming a Cluster
Now that we have successfully started the first node of our cluster, we can reboot all the other member nodes and they will connect to each other forming a cluster automatically upon startup.
Note: New nodes can be added to the cluster simply by launching new droplets from the image and having the value for wsrep_node_address updated follow by a quick reboot. You might also want to consider adding the new node to the wsrep_cluster_address but it is not compulsory.
Cluster nodes can be restarted simply by doing a reboot as long as there is at least one other node from the wsrep_cluster_address list still running. Should all the nodes of cluster are offline, the cluster will need to bootstrapped again by running the bootstrapping commands on the last online cluster node to ensure the latest database state of the cluster is used.
Test Master-Master Replication
We’ve gone through the steps up to this point so that our cluster can perform master-master replication. We need to test this out to see if the replication is working as expected.
On one of our nodes, we can create a database and table by logging into mysql client using:
mysql -u root -p
You will be prompted to enter the password for the mysql root user.
Once connected, run the following lines to create a database called playground and a table inside of this called equipment:
CREATE DATABASE playground;
CREATE TABLE playground.equipment ( id INT NOT NULL AUTO_INCREMENT, type VARCHAR(50), quant INT, color VARCHAR(25), PRIMARY KEY(id));
Execute the following SQL statement to insert a row of data into the equipment table:
INSERT INTO playground.equipment (type, quant, color) VALUES ("slide", 2, "blue");
Use the following SQL statement to confirm the new row has been successfully inserted:
SELECT * FROM playground.equipment;
| id | type | quant | color |
| 1 | slide | 2 | blue |
Master-master replication can be confirmed by SSHing into other cluster node and logging into mysql client. The same results should be shown across all cluster nodes if everything has been configured correctly.
This guide covered how to setup master-master data replication across an arbitrary number of nodes deployed on servers running Ubuntu 14.04.3 LTS using MariaDB 10.1. If you plan on using a Galera cluster in a production situation, you may want to take a look setting up HAProxy which helps to load balance traffic across cluster node behind a fixed endpoint.