The last couple of posts have been about using Apache Ignite on Kubernetes, and this one follows in that vein.
We’ve got our cluster up and running; we’ve got the ability to scale it using StatefulSets; we’ve activated it using a job. How do we check that it’s still up and running?
Kubernetes has a feature called a “liveness probe” that we should be using, that way it can restart the pod if something untoward happens.
I’m sure there are all kinds of clever ways of making it work. We need one that’s a balance between being quick to call while telling us something useful and not requiring significant resources. Checking that the log file is still being written to might be easy, but we’d have to parse it to decide if it’s happy. Similarly, checking that the JVM is up and running doesn’t tell us whether Ignite is able to respond to requests.
Is there a way to use some of Ignite’s built-in functionality to define a simple test? I think so. Let’s try the REST API:
containers: - name: ignite-node image: apacheignite/ignite:2.7.0 env: - name: OPTION_LIBS value: ignite-kubernetes,ignite-rest-http ... livenessProbe: httpGet: path: /ignite?cmd=version port: 8080 initialDelaySeconds: 30 periodSeconds: 10
There are more options you might like to tweak (failureThreshold, for example, defines how many times it’s allowed to fail before the pod is restarted, defaulting to three), but those last six lines are all we need. InitialDelaySeconds allows a grace-period before the check is executed — we don’t want to ping it while the node is still starting up.
You could also change the end-point to do something more sophisticated, but be careful because you want the status of this Ignite node and not that of the whole cluster. Don’t, for example, query for a particular value in a cache as you won’t know which node it’s stored on!
That’s it for this time, but stay tuned as I’m hoping to make these posts a little more frequent this year.