Not on My Watch: Managing Open Internet Ports in AWS Security Groups

Dario Fernando Rodriguez Herrera
Globant
Published in
10 min readFeb 20, 2024
Photo by Kaffeebart on Unsplash

A few years ago, I didn’t fully understand the importance of security in our applications and environments. However, everything changed when the environments I managed came under attack, and I had to confront the issues and take proactive measures to ensure that such incidents would never happen again. I’ve come to realize that managing this activity becomes significantly more complex with an increasing number of accounts. When dealing with an organization that has hundreds of accounts, the level of complexity amplifies.

What I’ve come to understand over the years is the crucial role security plays and that each one of us contributes significantly to the overall solution. It’s not just the responsibility of security analysts. However, I also recognize that, in many instances, the quickest path is not necessarily the most secure one. We might find ourselves too fatigued to construct a completely secure solution, especially when it’s just for testing purposes. I get it; it can be exhausting.

But what happens when this trial solution becomes a productive one? We all know that the right approach involves addressing every bug or issue before deploying in production. However, reality often presents a different scenario. Time constraints lead us to postpone these crucial tasks, sometimes to the point where we forget they need to be done.

Recognizing this and considering the challenges that cloud engineers face in managing multiple cloud environments, I took action. I created a fully automated solution within AWS to address one of the most common mistakes made by development teams: leaving the security group’s ports open to the Internet. I know that this initiative can significantly enhance the security of our environments, and I’ve decided to share it with you.

Considerations

Before starting, It’s important to consider some things that make this solution relevant:

  • It’s designed for numerous accounts. If you are working in a smaller environment with few accounts, implementing this solution might be excessive compared to slightly simpler alternatives.
  • This solution lacks an automated rollback mechanism if executed in the wrong account or with incorrect parameters. It only generates a log, which could assist you in manually restoring changes if necessary.
  • Because this solution is highly invasive, I recommend having a detailed plan before executing it and encouraging teams to resolve the problem independently before resorting to its implementation.
  • As part of this plan, you will need to define the target accounts, regions, and ports that will be affected. Be cautious about this, and consider starting with lower environments as a precaution.

Prerequisites

The following list outlines what we need to know and have implemented in our AWS environment before getting started:

  • AWS organizations: This service helps us to manage our multi-account environment.
  • Config Aggregator: In addition to AWS Organizations, this service allows us to have read-only access from the Master Account to AWS Config in all our accounts within our organization. Creating an aggregator is straightforward, and there is a tutorial available on how to configure it from the AWS console that covers all your target accounts and regions. I suggest that you start by scanning non-productive accounts like accounts for proofs of concept and development environments.
  • Having some knowledge of SQL queries, python, and Boto3 can be beneficial. While I’ll furnish you with the query and source code, having the ability to explore, evolve, or adapt this solution would be useful.
  • Last but not least, as part of the solution, we will work with services like AWS Lambda functions, AWS Config, EventBridge, S3, and IAM. It could be advantageous to have basic knowledge of the functionality of these services.

Implementation

I’ve defined two steps: first, identifying non-compliant security groups in order to know which ones will be affected by this solution. It’s possible that you realize you don’t need to do anything because everything is okay, but if not, you will need the second step: once you understand its magnitude, fix it.

Identify

First and foremost, we need to identify what requires fixing; this is the goal of our solution. While there are various approaches to address this task, I’ll demonstrate what works for me by using AWS Config aggregators powered by AWS Organizations.

What’s our issue? We’re searching for security groups that have TCP rules allowing Internet access (IP ranges = 0.0.0.0/0). To accomplish that, follow the next steps:

Note: It’s crucial to understand that the outcome of this identification step is not mandatory for remediation. However, I believe it could be beneficial to assess the current state before taking any corrective actions.

As I mentioned in the prerequisites, make sure you have already configured an Aggregator pointing to all your target accounts and regions.

  1. Go to your Master account AWS console and search Config service.
  2. In the left pane, select Advanced queries.
  3. Click on the New query button.
Query insecure Security groups — Steps 1–3

4. In Query scope, the list box should show you the aggregator that you created before and select it.

5. Now you have two options:

5.1. Use the Natural language query processor to generate the query:

5.1.1. Copy the following text into the text box: “Security groups that have rules allowing internet access (IP ranges = 0.0.0.0/0)”.

5.1.2. Click on the “Generate” button.

5.1.3. Click on the “Populate to editor” button.

Query insecure Security groups — Steps 4 –5.1.3

5.2 Copy and Paste the query directly to the query editor:

select
accountId,
resourceId,
configuration
where
resourceType = 'AWS::EC2::SecurityGroup'
and configuration.ipPermissions.ipRanges = '0.0.0.0'

6. Click the “Run” button.

Query insecure Security groups — Steps 5.2–6

At this stage, I genuinely hope you find no results. If that’s the case, congratulations, your team is committed to security, and you might not need this post anymore. However, if you get some results, don’t worry, I’ll help you to your team become a secure team. Download the results to track our progress.

It’s crucial to analyze the results and compile a list of ports that are currently ‘open’ and need to be closed. However, there might be cases where having certain ports open to 0.0.0.0/0 is acceptable.

Remediate

Now, perhaps the most crucial step in this post is how to ‘fix the world.’ I assume if you’re still reading, it’s because you obtained some results in the previous step. So, let’s get to work on addressing them.

Firstly, allow me to introduce the architecture of our solution:

Solution architecture

As you can see, I use EventBridge as a trigger for a Lambda function, which performs its magic using an IAM role. This role also allows it to store logs in an S3 bucket located in our master account. It’s important to note that we are operating in a multi-account environment. This implies that the services on the left will be deployed in all our accounts through stack sets, while the ones on the right will be deployed solely in our master account. I’ll provide you with the CloudFormation template and walk you through its details.

First, we have to create the resources in our master account because they will be used in the Cloud Formation template, which will be deployed in all our target accounts:

  1. Create a S3 bucket where information about what we delete will be stored. Save the name; we will use it forward.
  2. Create a role:
Create Role — Steps 2.1–2.3

2.1. Sign in to the AWS Management Console and open the IAM console.

2.2. In the navigation pane of the IAM console, choose Roles.

2.3. Then, select Create role.

Create Role — Steps 2.4–2.6

2.4. As Trusted entity type, select AWS service.

2.5. Choose Lambda as the ‘Use case for your service.

2.6. Then, select Next.

2.7. In the Add permissions step, simply select Next. We will add the policy as an inline policy afterward.

Create Role — Steps 2.8–2.9

2.8. Specify the role name and your preferred description (optional).

2.9. Click on Create role button.

After creating the role, we need to modify its trusted relationships to allow all organization accounts to assume this role:

  1. Get your organization ID by navigating to the AWS Organizations service and copying it from the left panel:
Getting Organization ID

2. Navigate to the role you recently created and modify the trusted relationship’s policy under the ‘Trusted Relationships’ tab:

Modifying Trusted relationship policy

3. Copy the following policy:

{
"Version": "2012–10–17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "*"
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"aws:PrincipalOrgID": "<your organization ID>"
}
}
}
]
}

Paste it into the text box, replacing <your organization id> with your organization’s ID, and then click on the Update Policy button:

Replacing Organization ID in Trusted relationship policy

4. Now, we need to add a policy that allows the role to write and read files in the bucket. Navigate to the Permissions tab and select the Create inline policy option under the Add Permissions list box:

Creating inline policy

5. Copy the following policy:

{
"Version": "2012–10–17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::<your bucket name>",
"arn:aws:s3:::<your bucket name>/*"
]
}
]
}

Select the JSON option in the policy editor and paste it into the text box, replacing <your bucket name> with your bucket, then click on the Update Policy button:

Replacing Bucket name

Now, we have to create our stack set with the services that will delete the non-desired rules in our security groups:

  1. Upload the code that lambda will execute to our bucket: Create a folder named revoke_sg_ingress in our bucket, this will be our source folder for our lambda function. Copy the following code and save it with the name lambda_function.py Compress it into a zip file named lambda_function.zip, and upload it to our S3 bucket within the folder.

2. Now, we need to create our stack set that will be deployed across all our target accounts. Copy the following code and make some modifications before saving it as a .yml file:

Line 28 and 43: Replace <your master account id> and <your role name> to build your role ARN.

  • Line 40: (optional) Add or remove ports as you need. This list corresponds to those that will be removed.
  • Line 42 and 45: Replace <your bucket name>.
  • Line 46: Replace <your folder name>. This is the folder where your lambda code is located.

3. Navigate to the CloudFormation service, select StackSet in the left pane, and click on the Create StackSet button:

Creating the StackSet

4. In the Chose template screen, only change to Upload template file in the Specify template section and select the file that you recently created. Then click on Next:

Choosing StackSet template

5. Provide a StackSet name and optionally a description and press Next.

Providing name to the stackSet

6. On the next screen, only press Next.

7. In the Set deployment options screen, define the Organizational Units (you can select up to 9 OUs) that will be the target of your solution:

Selecting Organizational Units

Also select the regions (at least 1) that will be your solution’s focus. These values depend on how your organization is configured, so be cautious about which Organizational Units you select. I recommend starting with lower environments like PoC accounts or development because this solution doesn’t have an automated rollback. If an issue occurs, you’ll need to check the log file and attempt to restore each rule manually. The rest of the configurations can be left as default. Set those values and press Next:

Selecting regions

8. Review the information, mark the Capabilities checkbox, and then click on the Submit button:

Capabilities check

Now, CloudFormation will start deploying the StackSet across all your target accounts. Once this process finishes, the solution is ready to run. If you check the StackSet template, the last resource created is an EventBridge rule that will trigger the Lambda once a day at midnight. So, all that’s left to do is wait until the next day and enjoy the results.

You will notice that your log file contains all the deleted rules, along with their corresponding security group ID and the account ID to which each one belongs.

Final Thoughts

This blog doesn’t aim to be the ultimate guide; it’s simply a starting point to offer you different tools that let you customize it to suit your specific needs. I want to remind you of the importance of having controlled environments. This article only describes one of the possible hundreds of security issues that we, as cloud administrators, have to face on a daily basis. I know that this activity could be too complex, but we have plenty of tools to deal with it. It’s a matter of taking consciousness of it and acting.

I recommend developing a detailed plan to address these issues, starting with empowering the respective teams to independently resolve their challenges. Simultaneously, we should focus on creating tools to help us ensure that those issues are controlled and that they will never happen again.

I’m not a developer, but I’m convinced that nowadays, we have countless ways to create tools to enhance our efficiency and simplify our tasks. Let’s do it.

References

--

--