Automating hadoop admin tasks with Cloudera Manager API
Do you manage a (or several like I do) Cloudera hadoop cluster and have a need to detect failed services and auto restart?
Cloudera provides an incredibly useful rest api for automating many of the tasks you can do through Cloudera Manager. I’ve written many scripts to collect information like Impala query history and save it into a hbase table so that we can analyze it in future.
Lately, I’ve had some Hue servers getting frozen every now and then and users are unable to log on, and restarting Hue will resolve the problem (even though we have Hue load balancer and 4 Hue servers). While I will spend time to investigate the root cause of why Hue server hangs, in the meantime, I need a temporary solution to auto detect hung Hue servers and just restart.
Here’s a script that I wrote in node.js (my github gist here) to poll the health of my Hue servers. I noticed that when the Hue server hangs, this metric “HUE_SERVER_WEB_METRIC_COLLECTION” shows up as unhealthy. My script searches for any Hue server that has that metric as unhealthy, and automatically restarts that particular Hue server and sends an email alert to me that it restarted Hue server.
You can see the json output of the service health here:
Then just use your favorite scripting language to parse and call the appriopriate rest apis to perform an action.