Demystifying AWS NACL
Apart from EC2 Security Group(s), a crucial distributed firewall component of AWS VPC is Network Access Control List or Network ACL. The idea is dead simple, what Security Groups does to your individual EC2 instances, NACL does (well, somewhat) to your VPC Subnets.
AWS NACL documentation picks up a dramatic start with:
A network access control list (ACL) is an optional layer of security for your VPC that acts as a firewall for controlling traffic in and out of one or more subnets.
Well, what they also mention implementing this optional layer actually brings a generic security flavor in the architecture.
Network ACLs act as a firewall for associated subnets, controlling both inbound and outbound traffic at the subnet level.
What it brings to the plate that Security Groups don’t? Well, besides being stateless firewall of Subnets, Network ACLs can block specific IP Address, where Security Groups fail.
Before we begin, it would be a real bliss if you’re already familiar with the following
- Launching an EC2 Instance into desired Subnet — To make it Public/Private.
- Using EC2 KeyPairs(.pem) to SSH into EC2 Instances
- SSH Agent Forwarding — To access other Instances without manually carrying the KeyPair
- Able to setup Custom VPC, Routing Tables for routing traffic into and from VPC
- Concept of Ephemeral Ports (try not to skip this one)
- Using OpenSSH on Linux (that’s what I’ll be using throughout)
- Installing and setting up Apache — To serve Web content
NOTE: You can use PuTTY for Agent Forwarding and KeyPair registration. I’ll be using OpenSSH.
Tiny disclaimer
- I’ve installed Apache on public EC2 instance and created a simple HTML page at /var/www/html as index.html with a single line of text: “Website accessed”
- I’ve created a security group named ‘allow-all-sg’ which basically allows any sort of traffic over any IP through any Port, to make things simpler since we’re onto NACL and not Security Groups. allow-all-sg will server as the Security Group of all instances in our architecture.
Setting Up the VPC
- A VPC named koushikVPC (yeah, that’s me :P)
- An Internet Gateway to forward traffic from Public Subnet
- A Pubic Subnet (CIDR: 192.168.0.0/26) in AZ us-east-1a.
- A Private Subnet (CIDR: 192.168.0.215/27) in AZ us-east-1b.
- A Route Table PrivateRT associated with the Private Subnet with IPv4 routes: 192.168.0.0/24 local No
- A Route Table PublicRT associated with Public Subnet with IPv4 routes: 192.168.0.0/24 local No, 0.0.0.0/0 igw-internet-gateway-id No
- A NACL named StrictNACL that currently allow no traffic (Denies All)
Setting Up EC2 Instances
- Instance launched into Public Subnet, named WebServer. It retrieved a Private IP: 192.168.0.42 and a Public IP: 54.89.154.80
- Instance launched into Private Subnet, named DBServer. It retrieved a Private IP: 192.168.0.215
- Both are linked to a EC2 KeyPair called koushik.pem (yeah, that’s me!)
- Both are attached to Security Group ‘allow-all-sg’
Setting Up Local Machine
- I’m using a Linux Distribution with OpenSSL installed
- I’ve used ssh-add command to register my KeyPair koushik.pem for the current session
- I’ve setup OpenSSH agent forwarding
That’s all! For the setup part though :P Uptil now, we have a Security Group that allows everything to pass through and a NACL that allows nothing. And we get the following:
To test the NACL, we’re going to SSH into our WebServer and face an inevitable setback:
Yeah, I’ve fairly cheated.
ssh <USER>@<IP> -o ConnectTimeout=<SEC>
awaits for 10 seconds only before announcing the disability of the remote server to respond. My internet connection is way too fast to await response for 10 seconds hence I gambled. If you’ve no problem waiting, you can remove the -o ConnectTimeout=<SECONDS> part and wait till your SSH Tool times out by default (usually 45 seconds).
So where did things go wrong in the VPC which led to timeout?
As we can see, no traffic (Inbound or Outbound) is allowed to flow between NACL and any of the Subnet. This is because we’ve set the default Inbound rule of NACL to DENY ALL.
How to resolve?
We’re going to impose a balanced set of rules to allow desired traffic but restrict unwanted traffic to pass through NACL to Public Subnet. So we add Inbound Rule:
100 SSH(22) TCP(6) 22 0.0.0.0/0 ALLOW
to allow SSH traffic to flow in from NACL to associated Subnets and connect to Port 22 of an Instance in any of the associated Subnets. And the following Outbound Rule:
100 Custom TCP Rule TCP(6) 1024–65335 0.0.0.0/0 ALLOW
to allow Instances in associated Subnets to respond to Ephemeral Ports of requesting machines.
NOTE: If Public IP of your machine does not bounce (mine does, this is handled by DCHP Systems of ISPs), you can go ahead and place your Public IPs instead of 0.0.0.0/0 to make this thing more secure.
Now let’s try to SSH into out WebServer again.
And there goes our first triumph. (Champagne, please :P)
That’s all fine. Let’s browse a Web page now (read #2 of disclaimer) by simply navigating to the Public IP of WebServer Instance.
That’s not good! But we allowed some traffic right?
Well, No! You allowed (as per Inbound Rule #100) only SSH traffic to flow in, and browsing to a web page (unsecure) requires allowance of HTTP traffic on Port 80 of the host (in our case, the WebServer). So, we create add a new Inbound Rule:
200 HTTP(80) TCP(6) 80 0.0.0.0/0 ALLOW
Let’s try again:
Wow! Victory!
Q: But, we didn’t add any new Outbound Rule, then….why…ummmm???
Yeah sure, Outbound Rule #100 says that no matter what the source IP or desired Port of the traffic is, if it is going to 0.0.0.0/0 (or, anywhere in the world) for Ephemeral Ports it will be responded to if the a reply is sent from behind the NACL, i.e, from any Instance. And, Inbound Rule #200 ensures HTTP request is replied to by any Instance within any of the associated Subnets.
Completing the Architecture
Have a look at current state of our architecture
Notice the cut at the end of the line joining Subnets and NACL, closer to the NACL?
That’s our next move!
NOTE: I’ve SSH agent forwarding enabled and I’m proceeding assuming the same for you.
Let’s try to SSH into the Private Instance, DBServer. Since, DBServer has no public IP (being within a Private Subnet), we’ll SSH into it from WebServer and not directly from our local machine.
Oops! That’s a fallback.
Q: I’ve setup NACL correctly to allow SSH! Then why can’t I connect?
That’s exactly because when traffic arrives from behind the NACL (or from internal/Subnet facing side), the Inbound & Outbound S
urfaces of the NACL is reversed, and seeing the position of NACL in VPC, it is natural, I guess. So we’re going to view a new and slightly modified exposure of NACL:
As we can see, the Outbound Surface receives the requesting traffic from Subnets, which only allow traffic for Ephemeral Ports to pass through. Hence, we add the following Outbound Rule:
200 SSH(22) TCP(6) 22 192.168.0.0/24 ALLOW
To allow only SSH connection traffic from VPC (although 0.0.0.0/0 would make no difference as per our architecture, but it is better to be precise when allowing IPs).
Now let’s try again to SSH into DBServer
:( But what’s wrong now?
Let us take a look at the present state of architecture once again
See that Outbound Surface allows SSH traffic, which means VPC router will successfully route SSH connection request to Port 22 DBServer, and eventually DBServer will respond to Ephemeral Port of requesting machine (which is the WebServer). This response traffic is routed by VPC router and falls onto the Inbound surface of NACL, where it gets discarded because no Inbound Rule allows traffic for Ephemeral Ports from any IP to pass through.
So we add the following rules:
300 Custom TCP Rules TCP(6) 1024–65335 192.168.0.0/24 ALLOW
To add traffic meant for Ephimeral Ports for Instances within the VPC to flow through (again, as per the present setup 0.0.0.0/0 would have made no difference).
And our architecture now looks like:
Now, we try again to connect to SSH into DBServer
Voila! And that’s how we NACL.
Loopholes of current setup/architecture
- Anyone can SSH into Public Instances because we’ve allowed 0.0.0.0/0 to send SSH request from Internet into our Public Subnet
- All rules that apply to Public Subnet, also apply in Private Subnet. To avoid this, it is always safe to use two NACLs.
Conclusion
When Traffic Origin is internet (or atleast external to the concerned VPC), Inbound & Outbound surfaces work accordingly but when Traffic Origin is somewhere within the concerned VPC, Outbound & Inbound surfaces reverse their roles.
PS: I’ll be upgrading the architecture by integrating NAT Instances & placing two separate NACLs, and thereby configuring them. So, stay tuned.