How to work with AWS Simple System Manager on CoreOS

Published in

LevOps

6 min readJan 27, 2017

I’ve decided to write blog posts about my daily learnings and experiences I had recently. I had many excuses not to write, but, very first reason was English has been the very first blocker for me. As a control freak, I never had the confidence to write in English, but, no drama. So here I am.

Since I started working with systems that only run in the cloud (approx. last 5 years), everything has changed a lot. I’ve come to fully understand the DevOps movement. I’ve changed how I think of being productive. That Theory Total Cost of Constraints Ownership (TCOOC) is not only the visible numbers and the importance of visibility and transparency. I believe that I’m happier now than back in days and have more satisfaction from work.

Long story in short, while I’m doing/planning something, I do it the cloud way.

Is this related with the title? Of course it is.

Whatever you do, whatever you use, when your time comes for cloud, try to have login-less and immutable supersonic disposable servers. Yes, I’m talking about the day you need to run ad-hoc commands on multiple servers. Of course, there are many solutions for that. The very old one is pssh, another one is Ansible, fabric, capistrano, a bash script etc. That list can be longer than the Nile.

Most of the tools comes with its price such as no visibility, dependencies, human errors, loss of time to set up the tools. For instance, very first thing could be “We need to run this through Jenkins. Let’s set up a Jenkins for this and create a repo for Jenkinsfile and write config management and provisioner code for deploying it. And backups of course…” Oh no! That is so wrong. First of all Jenkins is not your code executioner or scheduled job runner. Jenkins is your CD tool. Leave this poor guy to its own job. Second of all no one needs to over engineer things and add a weak link to the chain.

As an OPS who works massively with AWS, I’ve decided to use Amazon EC2 Simple Systems Manager for this reason.

Simple System Manager Service helps you with automating management tasks, applying OS patches, configuring OS and applications. To be able to do this AWS provides an agent which is called as amazon-ssm-agent. The agent has a built-in queue system for the commands which are coming from the AWS System Manager API. It works with polling from the API.

I’ve started to burn the agent into my AMIs. Now all the instances I have, have the agent by default.

While you’re working with Amazon Linux, Ubuntu, RHEL, CentOS you have no problem. There are RPM|deb packages for those OSs. What happens if you are working with CoreOS like me? There is nothing for CoreOS.

Luckily, amazon-ssm-agent has been written in go and you can use the same binary in CoreOS as well. What you need is:
* Binary of the agent
* systemd configuration
* Proper IAM role while you’re spinning up a new instance from your AMI.

As you may know, with CoreOS you cannot download and extract (rpm2cpio, etc.) one of the packages and there are no tools to build the binary on the instance such as gcc, etc., but Docker you have obviously.

I’m using the following script while I’m baking my AMIs

#!/bin/bash
# vim: et sr sw=4 ts=4 smartindent:
#
# 00030-amazon-ssm-agent.sh
#
# — pulls golang:1.6 image
# — checks out latest tag
# — builds amazon-ssm-agent
# — moves binaries into ~core/bin/ssm/export WERK_DIR=/home/core/ssm
export BIN_DIR=/home/core/bin/ssm
export CONFIG_DIR=/etc/amazon/ssmexport DOCKER_BUILD_BOX=ssm-build
export DOCKER_GOLANG_TAG=golang:1.6
export DOCKER_WERKSPACE=/workspace/src/github.com/aws/amazon-ssm-agentgit clone https://github.com/aws/amazon-ssm-agent.git $WERK_DIR
pushd $WERK_DIR
 git checkout $(git describe — abbrev=0 — tags)
 docker run — rm — name “$DOCKER_BUILD_BOX” \
 -v “$PWD”:”$DOCKER_WERKSPACE” \
 -w “$DOCKER_WERKSPACE” \
 “$DOCKER_GOLANG_TAG” \
 make build-linuxmkdir -p $BIN_DIR
 mv bin/linux_amd64/* $BIN_DIR/mkdir -p $CONFIG_DIR
 mv amazon-ssm-agent.json.template $CONFIG_DIR/amazon-ssm-agent.json
 mv seelog_unix.xml $CONFIG_DIR/seelog.xml
popdrm -rf $WERK_DIR
( docker rm -f “$DOCKER_BUILD_BOX” || true )
( docker rmi “$DOCKER_GOLANG_TAG” || true)cat <<EOF > /etc/systemd/system/amazon-ssm-agent.service
[Unit]
Description=amazon-ssm-agent
[Service]
Type=simple
WorkingDirectory=$BIN_DIR
ExecStart=$BIN_DIR/amazon-ssm-agent
KillMode=process
Restart=on-failure
RestartSec=15min
[Install]
WantedBy=network-online.target
EOFsystemctl enable amazon-ssm-agent.service

Actually, It’s a pretty straightforward script. Pulls down the golang docker image, builds the agent, and configures it. That’s all.

If you want to see in action there are few screenshots for you.

I’ve spun up a new instance with my recently baked AMI and with a correct IAM policy. From my laptop, I’ve run following command:

aws ssm send-command \
  --document-name "AWS-RunShellScript" \
  --instance-ids "i-00000000000000000" \
  --parameters '{"commands":["docker pull golang:1.7"],"executionTimeout":["3600"],"workingDirectory":["/tmp"]}'\
  --comment "test" \ 
  --timeout-seconds 600 \
  --region eu-west-1

I saw the command in EC2 -> Run Command section on AWS Console

And detail about the command

After In Pending changed to Success, I can see the output of the command on both AWS Cli and AWS Console

And this is the final result: docker images pulled down through AWS Ec2 System Manager

--instance-ids is limited by 50. It brings another step to run a basic command.

Instead of using instance ids, you can also use targets. --targets is an another option for sending commands

If all your instances are tagged properly and if you can target the instances you want to run a command on by tags, your life is way much easier.

aws ssm send-command \
  --document-name "AWS-RunShellScript" \
  --targets \
    "Key=tag:product,Values=cluster1,cluster2" \
    "Key=tag:env,Values=dev" \
  --parameters '{"commands":["ntpdate pool.ntp.org"],"executionTimeout":["3600"],"workingDirectory":["/tmp"]}' \
  --timeout-seconds 600 \
  --region eu-west-1

Here is an another example. Let’s say you have an autoscaling group and you want to run an ad-hoc command on all the members of the ASG.

aws ssm send-command \
  --document-name "AWS-RunShellScript" \
  --targets \
    "Key=tag:aws:autoscaling:groupName,Values=consul-admin" \
  --parameters '{"commands":["consul info"],"executionTimeout":["3600"],"workingDirectory":["/tmp"]}' \
  --comment "test" \
  --timeout-seconds 600 \
  --region eu-west-1

How /var/log/amazon/ssm/amazon-ssm-agent.log looks.