DevOps Simplified with AWS Systems Manager

Published in

hubbleconnected

5 min readMay 17, 2018

“It is not the strongest of the species that survive, nor the most intelligent, but the one most responsive to change.”
— Charles Darwin

Context

Hubble Connected is an IoT company with cloud platform running on AWS. With different environments spread across regions, our stack is fairly complicated.

We have dedicated DevOps team to manage the stack. And we have been running the service for years with over a million users and their (IoT) devices connected to the platform.

Problem

The problem is a very common one, and is very precisely depicted in the image below (thanks to Google for the image).

Meet the two friends — the Software Developer and the Operation (Ops) Engineer. Well one of them does not really agree with Charles Darwin on his quote!

It’s but obvious. In a fully clouded server stack in the cloud, where already so many components are scattered across the AWS dashboard, it’s quite a circus for the Ops Engineer to manage the stability. On top of that, the developers are too much inspired by the “Change”!

And this is how it usually goes.

There are EC2 instances, which are launched from the custom AMI (Amazon Machine Image). This AMI was created after thoughtful configuration and testing. It was finalized after quite an effort.

And then one day, the developer comes and says…

“I want change! I have a better configuration. Those two parameters that we added a month back, need some tweaking!”

Of-course the Ops engineer is not very happy. But after a healthy debate for half an hour, he is convinced that the change is really needed.

“Anyways, its only two parameters…”, the Ops engineer thinks.

“OK, great… let’s change it on the staging first”, the developer says.

And the Ops engineer completes the lengthy process of changing the parameters on all the staging instances.

What lengthy process, you asked? See for yourself, in the following diagram.

Once the Ops engineer is done, the developer is happily busy checking his new configuration.

One hour goes by.

“Hey, it worked!”, the excited developer, “It’s all set now. This is perfect. Let’s get this tested by QA and then let’s deploy the change on production!”

Obviously the even-more-unhappy Ops engineer is busy repeating the process on production environment across multiple regions!

The Solution

The search for a solution started with a realization that such changes are in-fact inevitable!

Those AMIs which — once upon a time — were found as a solution to another problem have now become the problem themselves.

Once that’s realized, the second step towards the solution is the realization that we need a central place to store the parameters, and scripts. We should be able to change in one place, and the server instances should be smart enough to pick-up only the changes, based on their corresponding environment.

We need a configuration. Centralized. Flexible.

Say Hello to AWS System Manager!

AWS Systems Manager gives you visibility and control of your infrastructure on AWS. Systems Manager provides a unified user interface so you can view operational data from multiple AWS services and allows you to automate operational tasks across your AWS resources.

And it’s very well integrated into EC2 — in the form of SSM Agent. It comes bundled with Amazon Linux or you can install it separately on other OS as well.

And it can help to solve the problem defined above. But not alone! It needs some more configuration, but of-course!

So how to configure it?

After few days of experimentation, and putting various services into work, we finally arrived at a solution, as seen in the following diagram.

Although this diagram involves more components than the previous one, this one has more automation involved in it!

Let’s go through what does this diagram have to say!

We configured the Parameter Store, to contain the usual key-value pair configuration we used hard code in AMIs.
Then we setup a CloudWatch Event to trigger a rule when the Parameter Store changes.
This Rule is configured to download a script from the S3 bucket and pass it to the Run Command of the Systems Manager.
The Systems Manager Agent is configured to update the environment variable that belongs to the changed parameter store.
The server application, any scripts (running on the EC2 instance) is configured to work with the environment variables.
One point to note here is that the Systems Manager Agent is invoked every time there is a change in the Parameter Store.
And we need to build a logic inside to ignore any parameters that are not related to the EC2 instance in which it is running.

Well, that’s it! Once this setup is done, the Ops Engineer is happy as he wouldn’t be spending lot of time executing the same repeated steps across various stacks and environments.

The Problem and Solution — side by side

Let’s have a look at the steps that an Ops Engineer have to execute before and after the solution.

Summary

The AWS Systems Manager provides a central and flexible way to automate operational tasks. By putting it to work with the AWS ecosystem, we have cut down on a huge saving on Ops team’s time and effort.

DevOps simplified!