Recover a MariaDB Galera Cluster after a Full Crash
Given a 3-node MariaDB cluster with a confirmed full crash (none of the nodes in the cluster can start the MariaDB service)
Nodes
1–192.168.1.38, 2–192.168.1.39 and 3–192.168.1.40
Confirm that it is a full crash (the value of /var/lib/mysql/grastate.dat on all the nodes should be similar to the below)
# GALERA saved state
version: 2.1
uuid: 80ece24a-a040–11eb-9a95–87a87e7c890d
seqno: -1
safe_to_bootstrap: 0
On one of the nodes, modify the /var/lib/mysql/grastate.dat file to (this is to enable it to start):
# GALERA saved state
version: 2.1
uuid: 80ece24a-a040–11eb-9a95–87a87e7c890d
seqno: -1
safe_to_bootstrap: 1
On the same node, delete the cluster cache:
mv /var/lib/mysql/galera.cache /var/lib/mysql/galera.cache.ori
On the same node, modify the MySQL configuration (/etc/mysql/my.cnf)by modifying as (this comments out the other cluster nodes):
#wsrep_cluster_address=gcomm://192.168.1.38,192.168.1.39,192.168.1.40
wsrep_cluster_address=”gcomm://”
You can now restart MariaDB on this node
systemctl start mariadb.service
Please confirm that the service is now running:
systemctl status mariadb.service
On the remaining two nodes,
mv /var/lib/mysql/galera.cache /var/lib/mysql/galera.cache.ori
systemctl start mariadb.service
Please confirm that the service is now running:
systemctl status mariadb.service
If the two remaining nodes now have running MariaDB instances, perform the following on the first node:
Stop the MariaDB service
systemctl stop mariadb.service
Modify the MariaDB configuration as below:
wsrep_cluster_address=gcomm://192.168.1.38,192.168.1.39,192.168.1.40
#wsrep_cluster_address=”gcomm://”
You can now restart MariaDB on this node
systemctl start mariadb.service
Please confirm that the service is now running:
systemctl status mariadb.service