Recover a MariaDB Galera Cluster after a Full Crash

Published in

Crane Cloud

1 min readJan 30, 2022

Given a 3-node MariaDB cluster with a confirmed full crash (none of the nodes in the cluster can start the MariaDB service)

Nodes

1–192.168.1.38, 2–192.168.1.39 and 3–192.168.1.40

Confirm that it is a full crash (the value of /var/lib/mysql/grastate.dat on all the nodes should be similar to the below)

# GALERA saved state
version: 2.1
uuid: 80ece24a-a040–11eb-9a95–87a87e7c890d
seqno: -1
safe_to_bootstrap: 0

On one of the nodes, modify the /var/lib/mysql/grastate.dat file to (this is to enable it to start):

# GALERA saved state
version: 2.1
uuid: 80ece24a-a040–11eb-9a95–87a87e7c890d
seqno: -1
safe_to_bootstrap: 1

On the same node, delete the cluster cache:

mv /var/lib/mysql/galera.cache /var/lib/mysql/galera.cache.ori

On the same node, modify the MySQL configuration (/etc/mysql/my.cnf)by modifying as (this comments out the other cluster nodes):

#wsrep_cluster_address=gcomm://192.168.1.38,192.168.1.39,192.168.1.40
wsrep_cluster_address=”gcomm://”

You can now restart MariaDB on this node

systemctl start mariadb.service

Please confirm that the service is now running:

systemctl status mariadb.service

On the remaining two nodes,

mv /var/lib/mysql/galera.cache /var/lib/mysql/galera.cache.ori

systemctl start mariadb.service

Please confirm that the service is now running:

systemctl status mariadb.service

If the two remaining nodes now have running MariaDB instances, perform the following on the first node:

Stop the MariaDB service

systemctl stop mariadb.service

Modify the MariaDB configuration as below:

wsrep_cluster_address=gcomm://192.168.1.38,192.168.1.39,192.168.1.40
#wsrep_cluster_address=”gcomm://”

You can now restart MariaDB on this node

systemctl start mariadb.service

Please confirm that the service is now running:

systemctl status mariadb.service

Written by Alex Mwotil