Use Python Lambdas to remotely run shell commands on EC2 instances

Nick Miller
8 min readJan 9, 2023

--

I was recently working on a Terraform file for Kubernetes and ran into the limits of Terraform when trying to update a file with the IP addresses of nodes in the cluster. This step proved challenging to solve with Terraform because these addresses are made only after you create the resources, and there’s no way to insert those into your bootstrap script within your Terraform template in advance.

When researching solutions, I then saw Mitchell Hashimoto himself explicitly said that Terraform is not for managing runtime:

Link

So that’s that. You should use Consul or something like Ansible when dealing with changes at runtime.

….but then I had a terrible idea: how can I make a Lambda do this? After digging and ruminating, I found a way to alter already running nodes using AWS’s Python library, boto3, to send shell commands. And because I love Lambdas, I wanted to share my research on that with you today.

To be clear: you should still use tools like Ansible or Consul to carry out runtime tasks like bootstrapping. What this project does shows you the immense flexibility of Python Lambdas and the power they provide you when setting up infrastructure. If you really want to, you can use Lambdas for almost anything.

Overview

  • Part 1: Walkthrough of the Python Script
  • Part 2: Deploying the Project with Terraform
  • Part 3: Understanding the Permissions needed for SSM
  • Part 4: Testing the Script on Lambda

Prerequisites

  • Cloud 9 IDE with Admin AWS Credentials — We will create IAM policies, work with Systems Manager, and use Python to interact with AWS via boto3. Cloud 9 also has Terraform pre-installed, making this walkthrough easier.
  • Familiarity with Python and Boto3 — I’m operating with the assumption that I don’t have to explain the loops, lists, or libraries in this walkthrough.
  • Beginner Level Understanding of Terraform — I’ll provide the templates, and you should know how to run them.

Part 1: Walkthrough of the Python Script

The goal of this script is to search through our EC2 instances and run a shell script on the ones that both possess a tag (which we’ll determine in the script) and are in the “running” state.

Let’s walk through this on the IDE first. Feel free to hit python3 in the Cloud 9 terminal and follow along in the interactive shell.

At the very top of the script, we’ll define the script to run and the tag:

#Import boto3 library
import boto3

#Define the contents of your shell script
script = """
echo "Hello World!" > /home/ec2-user/helloworld.txt
pwd >> /home/ec2-user/helloworld.txt
"""
#Define the tag possessed by the EC2 instances that we want to execute the script on
tag='Test'

Next, we’re going to work on the code that will eventually end up within the lambda_handler . First, define both the ec2 and SSM boto3 clients, as we’re going to use them both:

#Define ec2 and ssm clients
ec2_client = boto3.client("ec2", region_name='us-east-1')
ssm_client = boto3.client('ssm')

Second, gather a list of Reservations where there are instances with the tag that we identified earlier:

#Gather of instances with tag defined earlier
filtered_instances = ec2_client.describe_instances(Filters=[{'Name': 'tag:Name', 'Values': [tag]}])

#Reservations in the filtered_instances
reservations = filtered_instances['Reservations']
Note: If you don’t have any instances with the matching tag, ‘Instances’ may be empty.

The describe_instances() returns a response object that contains an object containing instances. The Filters parameter filters that object for instances that have the tag we identified earlier in the script.

The instances are within the dictionary key called “Reservations.” A “Reservation” is associated with a particular Instance Type and Availability Zone and can cover one or more instances. It is automatically created when Instances are launched. We access that “Reservations” list through reservations variable that we defined.

Third, we’re going to iterate through the reservations and pull out the ‘running’ instances:

#Create a an empty list for instances to execute the shell script within
exec_list=[]

#Iterate through all the instances within the collected resaervations
#Append 'running' instances to exec list, ignoring 'stopped' and 'terminated' ones'
for reservation in reservations:
for instance in reservation['Instances']:
print(instance['InstanceId'], " is ", instance['State']['Name'])
if instance['State']['Name'] == 'running':
exec_list.append(instance['InstanceId'])
#divider between reservations
print("**************")

For each reservation in the reservations list, the code iterates over the list of instances in the ‘Instances’ field of the reservation dictionary.

For each instance, if the state of the instance is 'running', the instance ID is appended to the exec_list list.

Fourth, we’re going to use the boto3 ssm_client defined earlier to execute the shell script on the EC2 instances on the exec_list:

#Run shell script
response = ssm_client.send_command(
DocumentName ='AWS-RunShellScript',
Parameters = {'commands': [script]},
InstanceIds = exec_list
)

#See the command run on the target instance Ids
print(response['Command']['Parameters']['commands'])

This will not work for you as the EC2 instances do not have the correct IAM Role permissions to allow SSM to execute on this instance.

If I SSH into the instance that I’ve already configured with the correct permissions, we can see the result:

This is the expected result of what we put in the script variable at the top of this Python script:

script = """
echo "Hello World!" > /home/ec2-user/helloworld.txt
pwd >> /home/ec2-user/helloworld.txt
"""

So that’s what the script does. Let’s get this running on your machine.

Part 2: Deploying the Project with Terraform

I went into detail on how to set up a Lambda in Terraform in my article Create a Python Lambda to Save On Your Cloud Bill with Terraform. I recommend you check it out if you want to go deeper into understanding how to set up Lambda in Terraform. I’m assuming you have already set up a template or two before for this post.

We’re going to clone my repo and deploy it using our Cloud 9 CLI:

#Clone and Enter the Repo
git clone https://github.com/nickcmiller/lambda-shell-execution.git
cd lambda-shell-execution

#Initialize Terraform
terraform init

#Apply
terraform apply --auto-approve

Your directory will look like this:

Feel free to poke around if you want to see

If you want to access the EC2 instance through SSH, I’ve created the connection script you can use in the form of an Output:

The test_key_pair.pem is in the root directory of the Terraform file (you can see it at the bottom of the directory screenshot). As always, don’t forget to chmod 400 before using it.

Now let’s discuss how to set up Permissions that allow Lambda to access instances.

Part 3: Understanding the Permissions needed for SSM

To allow our Lambda to deploy a shell script on EC2, we need to enable SSM to register the EC2 agents. To do that, I created a role ec2-role-for-SSM that you will see if you click into your Test instance.

This role utilizes one of Amazon’s Managed IAM Policies (AmazonEC2RoleforSSM):

This policy gives Full Access to SSM Messages and List, Read, and Write Access to Systems Manager:

The Role Policy, found under Trust Relationships, allows it to be assumed by EC2 services:

If we go into Instance the User Data…

We’ll see that we’re making sure that the EC2 instance has the agent installed, that it is started, and that it is enabled for future reboots:

Now, go over to the Lambda console and select our Lambda execute-shell-script:

As you’ll see under Configuration, it has also been given SSM permissions:

Click on the Role name, and you’ll see that this is a custom policy:

Its permissions allow for List and Write Access to EC2 as well as List, Read, and Write Access to SSM:

Back in the Role under Trust relationships, we can see that this role is designated for Lambda:

If you don’t have each of these pieces set up correctly, you’ll likely get an error saying something along the lines of An error occurred (InvalidInstanceId) when calling the SendCommand operation: Instances [[i-0xxxxxxxxxxxxxx]] not in a valid state for account 2xxxxxxxxxx, which isn’t a constructive error.

If you get stuck, I’ve also found that it’s helpful to go into your EC2 instances, and after verifying that SSM is installed and running, I check the logs with this command sudo cat /var/log/amazon/ssm/amazon-ssm-agent.log . If your SSM agent is hibernating, ensure your Roles and Policies are set up correctly.

Part 4: Testing the Script on Lambda

Now let’s test this out!

For my test event, I left the generic JSON as we’re not utilizing Event data for this script:

When I run it, I see that my Test instance is running and the command was sent via SSM:

If we SSH into the test instance, we’ll see it was successful:

That’s an alright demo. But it’s just a text file.

What if we changed the script variable to something like this?

script = """
#!/bin/bash
sudo yum update -y
sudo amazon-linux-extras install nginx1 -y
sudo systemctl enable nginx
sudo systemctl start nginx
"""

This script will now fully bootstrap an NGINX server on an EC2 instance through the use of a Lambda:

Now that’s cool!

Wrap Up

When we combine boto3 and Lambdas, we can make our AWS infrastructure do almost anything we want with little overhead. And as we saw, when we use code to implement our architecture, it becomes trivial to make changes, even at runtime.

I hope this post inspired you to think more about how you can creatively use Lambdas in your environment.

As always, please leave any questions or thoughts in the comments.

--

--