IPTables and Docker
In this post I will be talking about the nightmare of all Ops people that have to deal with Docker.
We all know that Docker is awesome.
It makes our lives really easy, but there is one problem. It works with IPTables
for who don’t know the default firewall on Linux
.
Docker creates IPTables
rules for you and it becomes really hard to manage if you need to control what goes in and out your server when you install Docker in production.
The issue
Let’s say you have a container that listen on port 443
. You only want to allow traffic from your load balancers as it handles some of the security for you. Nothing really hard right?
The naive approach is to create a rules on the default INPUT
chain which will have kind of the following:
iptables -A INPUT -p tcp --dport 443 -s 172.16.0.0/26 -m state --state NEW,ESTABLISHED
This rule says: allow new
and established
inbound traffic from the 172.16.0.0/26
network to the port 443
on the tcp
protocol.
You put your iptables -A INPUT -j DROP
at the end and then you are happy because you think it works! So you try from your machine and the port is still open for you. Hummm, weird?
Not that weird. The issue here is that since Docker creates interfaces for the container when you don’t specify --net=host
. Those interfaces have an IP address on it. They usually are using the 172.17.0.0/24
network. And the most important of all, they are only routable from the host, not to the rest of the network — that’s why you do -p
to expose the port so the host will listen and forward the traffic to the container.
Forwarding traffic 101:
Each container invocation will create a rule looking like this:
iptables -A DOCKER -d 172.17.0.2/32 ! -i docker0 -o docker0 -p tcp -m tcp — dport 443 -j ACCEPT
Which is the exported port and says that accept everything that does not come from the docker interface to the docker interface to the ip of the container.
This DOCKER
chain is referenced in the FORWARD
chain like this: -A FORWARD -o docker0 -j DOCKER
. The FORWARD
chain is there when traffic is transferred from interfaces to interfaces.
Chain DOCKER (1 references)
pkts bytes target prot opt in out source destination
0 0 ACCEPT tcp — !docker0 docker0 0.0.0.0/0 172.17.0.2 tcp dpt:443
Well as you can, this is not it because there are no packets that have match that rule so far!
We need to dig deeper. So let’s take a step back and understand how the iptables
filtering and ordering works. A quick google search yields the following:
So the INPUT
chain is only processed after deciding if the packet needed to be nat’ed or not.
It is clear that there is something else in the process.
What I did not know is that there were multiple tables in iptables
! The default table is called filter
and it’s the most used one.
But packets are processed by the nat
tables first!!
-A PREROUTING -m addrtype — dst-type LOCAL -j DOCKER
Dammit, everything is routed to the DOCKER
chain in the nat
table!!!!
-A DOCKER ! -i docker0 -p tcp -m tcp — dport 443 -j DNAT — to-destination 172.17.0.2:443
And here we go, the packet’s destination is changed to 172.17.0.2:443
, so any filtering on INPUT
will not work…
How are we going to be able to block the traffic without touching to Docker.
Some people have talked about the DOCKER-USER
chain, which would do the work, but you kind of have the same problem because of the NAT
. Some other people said to deactivate the Docker feature to maintain the rules directly. This is a really bad idea, as you don’t want to re-invent some intelligence that will do that for you.
Remember? You only want to protect your server, not mess up with actual workflow. Well my friends, I have the solution for you.
It’s from a bag of tricks. We need to act in the nat
tables in order to block stuff.
The idea is quite simple:
- We start by creating a chain called
DOCKER-BLOCK
:-t nat -N DOCKER-BLOCK
- Then we inject on top, blocking everything in the
PREROUTING
chain:-t nat -I PREROUTING -m addrtype — dst-type LOCAL -j RETURN
- Then we inject another rule on top, this one jumps everything to
DOCKER-BLOCK
:-t nat -I PREROUTING -m addrtype — dst-type LOCAL -j DOCKER-BLOCK
At this point the flow is like this:
PREROUTING
-> DOCKER-BLOCK
-> RETURN
-> (the rest is unreachable) DOCKER
So everything is blocked by default!
Now the trick is to add rules one by one.
-t nat -A DOCKER-BLOCK -p tcp -m tcp — dport 443 -m state — state NEW -j DOCKER
Now the workflow is:
PREROUTING
-> DOCKER-BLOCK
-> DOCKER
-> (unreachable) RETURN
-> (even more unreachable) DOCKER
We successfully bypassed Docker by jumping back to it when we were allowing the connection.
So people would tell me: how do you deal with flushing and persistence Edouard?
Well my friend, I would tell you that everything is under control. My script works before, while and after Docker.
The idea is to create a shell script where you put your rules.
It is possible to flush only one chain from one table, but not possible to restore only one chain from one table. We have to improvise.
The danger here is two fold:
- We don’t want to allow traffic while we are reloading
- We don’t want to interrupt existing connections while we are reloading
- We don’t want to mess up with Docker
The code is on github but I’ll go over it quickly:
- We create the
DOCKER-BLOCK
chain in case it does already exists - We add our two custom rules to the
PREROUTING
chain, there are 4 rules now. - We delete 2, resetting to only 2 rules (otherwise it adds 2 every time)
- Then we create the
DOCKER
chain as we need it to be referenced in case the Docker daemon did not create it yet - We let all the
established
connection go through - Then we flush the
DOCKER-BLOCK
chain: at this point no new connections can be made, that’s OK this the application will try to sendSYNC
packets multiple times - Here we add our custom rules which should restore the traffic.
And here we go! Super clean iptables
rules that will be always idempotent!!
Last point: this only works for containers that don’t have the --net=host
. If you are using the host networking stack, you will have to deny the traffic using the usual INPUT
chain.
I hope you enjoyed this exercise, leave a comment or reach out to me https://twitter.com/moonbocal if you need!
https://gist.github.com/tehmoon/b1c3ae5e9a67d66186361d4728bed799#file-iptables-reload-sh