How to enable high availability for the IBM Cloud Private cluster monitoring service after initial installation

Jerome Tarte
IBM Cloud
Published in
3 min readMay 23, 2018

--

When you deploy an IBM Cloud Private cluster, you define its size and topology. However, the needs could evolve with time. IBM Cloud Private allows this evolution by enabling you to add nodes (worker, proxy, management) dynamically after the installation.

Some choices you made could have an impact. The topology of the monitoring service is a good example. When you create your cluster, you might have focused more on enabling the solution rather than thinking of all high availability aspects. Starting with one management node could be enough for the beginning. The monitoring service is active, but when the use of the platform grows, you could need to enable high availability for the monitoring service.

If you add a second management node dynamically, you could have an issue with the storage used by the monitoring service. When you use one management node, the monitoring service uses local storage (a directory on the VM hosting the management). When high availability is needed, you are compelled to use a network shared storage provider, like GlusterFS or vSphere volume.

Rather than uninstalling your current cluster and reinstalling a new one with the proper storage configuration, it is possible to reconfigure the storage of the monitoring service. You can evolve from local storage to network shared storage. The steps in the rest of this article show you how to proceed.

This procedure was implemented and tested on an IBM Cloud Private version 2.1.0.2.

Prerequisites

You must have a cloud provider configured into the cluster (this article is based on an example that uses vSphere volume).

Reconfiguring storage for monitoring service

On the boot note of your cluster, from the /opt/ibm-cloud-private-2.1.0.2/cluster/directory:

  1. Back up the /opt/ibm-cloud-private-2.1.0.2/cluster/cfc-components/monitoring directory.

2. Define a storage class deployment file/misc/storage_class/monitoring-sc.yaml. Here is an example of its content:

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: monitoring-icp-sc
provisioner: kubernetes.io/vsphere-volume
parameters:
diskformat: thin
datastore: DS-VsphereVol-01

3. Deploy your storage class using this command:

kubectl create -f  ./misc/storage_class/monitoring-sc.yaml

4. Edit the content of ./cfc-components/monitoring/alertmanager-pvc.yaml.

5. Change the name of the storage class to the name defined in step 2.

6. Remove the selector definition at the end of the file. Here is an example of the result:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
labels:
app: monitoring-prometheus
component: alertmanager
name: monitoring-prometheus-alertmanager-pvc
namespace: kube-system
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: "1Gi"
storageClassName: monitoring-icp-sc

7. Save the file.

8. Redo steps 4 to 7 on grafana-pvc.yaml.

9. Redo steps 4 to 7 on monitoting-pvc.yaml.

10. Deploy the new PVC definition:

kubectl apply --force --overwrite=true -f ./cfc-components/monitoring/alertmanager-pvc.yaml
kubectl apply --force --overwrite=true -f ./cfc-components/monitoring/grafana-pvc.yaml
kubectl apply --force --overwrite=true -f ./cfc-components/monitoring/prometheus-pvc.yaml

11. Delete the old PV used previously by monitoring pods (monitoring-alertmanager-pv-X, monitoring-grafana-pv-X, monitoring-prometheus-pv-X)

12. Reload the Grafana database.

kubectl delete jobs/monitoring-grafana-ds -n kube-system 
kubectl apply -f ./cfc-components/monitoring/grafana-set-ds-job.yaml

Grafana can now access the Prometheus database and dashboards are working correctly.

Optional steps:
To get your configuration up to date (in the case of an upgrade, for example), you should update the config.yamlfile:
12. Edit config.yaml , and in the monitoring section, replace the name of the storage class by the one defined in step 2.

monitoring:
# set storageClass used by monitoring components:
# "-" means not use any storage
# "monitoring-storage" means not use any network storage but use local one
storageClass: monitoring-icp-sc
pvPath: /opt/ibm/cfc/monitoring

13. Save the config.yaml file.

Verify monitoring service’s high availability

After the reconfiguration is done, you can test the high availability behaviour of the monitoring service. Monitoring service is running on one management at a time. By stopping the management node where the monitoring service is running, IBM Cloud private will move the monitoring service workload to the second. By using a network shared storage, the new monitoring service pods are able to access to monitoring service data.

By following the steps of this article, you have enabled the High Availability of monitoring services after the initial installation of IBM Cloud private cluster.

Related references

--

--