Monitoring AWS Batch Jobs with Rust
At Pixability my team and I recently created a pretty neat AWS Batch driven system. AWS Batch can can handle almost any task seamlessly and it’s pretty easy to manage with the console. This is great already, but I was curious to see what it would look like to monitor Batch with Rust. Watchrs was inspired by this question and currently provides basic functionality to do so. In this post we will be briefly going over how the main components of watchrs were built and how to use them all together.
An AWS Batch job can have a state of:
Going forward we will define monitoring batch jobs as the ability to track, record, and react to the different states a batch job can have. Therefore, the goal is to create a system that can track these states and notify us somehow whenever a certain condition is met. One of the ways we can do this is by using CloudWatch Events to track the state changes. We can then use SNS for alerting. The general process for this approach can be broken down into four parts:
1. Create a SNS topic.
2. Subscribe to that SNS topic with an email you want alerts to be sent to.
3. Create a Cloudwatch Event rule to watch for state changes for jobs.
4. Set the SNS topic as the Cloudwatch target for that rule.
- AWS Account
- AWS CLI configured
Note: watchrs is still under development and is not production ready ATM.
Since watchrs is responsible for monitoring AWS Batch jobs, we want the AWS resources to be up and running. However, creating the Batch resources is not technically required to run the code. To set up using the console, follow this guide. To get started clone the repo at https://github.com/itsHabib/watchrs.
Subscribing To Topics
The first steps in being able to track and alert on our batch jobs is to create and
subscribe to a topic. The snippet below shows the private methods used by
pub fn subscribe(..) in watchrs. The topic creation and subscription is pretty standard except that we add a
Policy to the topic attributes. Without this attribute CloudWatch would not be able to invoke SNS, no alerts! Keep in mind that SNS subscriptions need to be confirmed before being able to deliver on those endpoints.
Configuring CloudWatch Events
After a topic is setup, the next step would be to create an event rule and target. At a high level the expression in the event rule tries to match against events that other AWS services emit. If a match is found the rule is triggered. These rule expressions can be large and vary for each service so I would definitely recommend checking out the docs. In the code below the event rule is filtered with the parameters passed in, like
queue_arn . It’s important to be as specific as you can with event patterns to reduce noise in invocations. Without a details section in the event pattern, the rule below would be invoked for each of a job’s seven states, for ALL jobs. I would recommend using at least a queue ARN and status when creating these event patterns.
Putting It All Together
main.rs file to the
src folder. Create a main function and add the following code. Make sure to change the email address and other variables to real values. The
true flag is used to enable the event rule and the rest of the
Option<T> parameters are used as filters for the event pattern.
The main file runs through the four steps described earlier to create our alerting system. To run the program with logging, run
RUST_LOG=watchrs cargo run . Make sure you also head over to your email and confirm the subscription sent out during the subscribe call. The format given by SNS is not great but its focus is on distributing notifications and not formatting them.
To remove what watchrs created add a call to the
unsubscribe method using your subscription ARN and topic ARN previously created. You can also just head over to the AWS console and remove the SNS topic and CloudWatch event.
I hope this post gave you a good idea on just one of the ways to monitor Batch jobs. Like always, feel free to ask any questions, comment, or give any suggestions for a next topic.