Create a CloudFormation Template for Route53 Health Check, CloudWatch Alarms and SNS

The multiple configuration options of the R53 health check and Cloud Watch alarm can be very confusing, especially when you just start to play around with it. The step-by-step guide below with the sample CF template snippet can help with the understanding.

  1. Create a Route 53 Health Check
"Route53HealthCheck": {
"Type": "AWS::Route53::HealthCheck",
"Properties": {
"HealthCheckConfig": {
"Type": "HTTPS",
"FullyQualifiedDomainName": "xxxx.com.sg",
"RequestInterval": "30",
"FailureThreshold": "5"
},
"HealthCheckTags" : [{
"Key": "Name",
"Value": "HealthCheck"
},
{
"Key": "Project",
"Value": "ProjectA"
}]
}
}

The health check will fire an HTTPS request every 30 seconds (configured by “Request”) to xxxx.com.sg and a data point (value 1: healthy, 0: unhealthy. Let’s name it R53 data points for illustration) will be registered. If there are 5 (“FailureThreshold”) consecutive failure data points in a row, a failure status will be displayed on the AWS Route 53 console. A common confusion is caused by mislinking it with the metrics that CloudWatch Alarm is monitoring.

2. Create a Cloud Watch Alarm

"Alarm" : {
"Type" : "AWS::CloudWatch::Alarm",
"Properties" : {
"AlarmDescription" : "Health Check Alarm",
"AlarmName": "HealthCheckAlarm",
"Namespace": "AWS/Route53",
"MetricName": "HealthCheckStatus",
"Dimensions": [
{
"Name": "HealthCheckId",
"Value": { "Ref": "Route53HealthCheck" }
}
],
"ComparisonOperator": "LessThanThreshold",
"Period": "60",
"EvaluationPeriods": "5",
"Statistic": "Minimum",
"Threshold": "1.0",
"AlarmActions": [{
"Ref": "AlarmSNSTopic"
}]
}
}

The CloudWatch Alarm is monitoring the data points collected every 60s (“Period”). Please note these data points are different from those generated by the R53 Health Check(Let’s name it CW Data Points). One CW data point is generated by getting the minimum(“Statistic”) value of two consecutive R53 data points within the 60s window and if it is less than(ComparisonOperator) 1.0(“Threshold”), it will be marked as “BREACHING”. E.g. If there are two consecutive R53 data points in the evaluation window and one is 1(OK) and the other is 0 (fail), the CW data point(metrics) registered from them would be min (0,1) = 0.

If the very first metric (the oldest period in evaluation period(s) window) have to be a breaching one, AND all of the rest of periods within the evaluation (period(s)) window are either breaching or missing, the alarm will be triggered.

3. Create the SNS topic

The last step is straightforward. The alarm is configured to be sent to the SNS topic : AlarmSNSTopic , that can be created below,

"AlarmSNSTopic" : {
"Type" : "AWS::SNS::Topic",
"Properties" : {
"Subscription" : [ {
"Endpoint" : "add valid email address",
"Protocol" : "email"
},
{
"Endpoint" : "add valid email address",
"Protocol" : "email"
},
]
}
}

Hope it helps.

*After creating the alarm, strangely, it remains Insufficuent_data state for a few hours before we realize that this part of CF template must run under us-east-1, instead of ap-southeast-1, as healthcheck metrics is only available there. We end up splitting the CF template into two.