At Vamp our trial clusters present a harsh security challenge. We want trial users to bring their own containers but we don’t want them to have access to the Vamp infrastructure.
One of the most powerful and yet often ignored security features in Kubernetes are network policies. Whilst they lack the the advanced features of modern firewalls, network policies are a powerful tool for building deeply secure environments as they allow you to secure traffic within a cluster.
Policies are often ignored as a security feature because:
- There is a misconception that cluster-edge security is sufficient;
- They require an understanding of how Kubernetes’ networking works; and
- Debugging them can be difficult.
In our environments, we don’t take any aspect of security for granted, at any point, so in addition to using best practices such as role-based access control (RBAC) and mutual TLS and tools like Hashicorp Vault to manage cross-cluster secrets and the PKI (Public Key Infrastructure). We also make extensive use of network policies to extend network access control to the traffic within and between nodes.
We use Google Cloud’s Kubernetes Engine (GKE) as our reference platform because if it doesn’t work on GKE, it’s probably not going to work on Amazon’s EKS or Azure’s AKS.
GKE implements network policies using Tigera’s Project Calico. In our experience Calico offers the best behaved network policy implementation. This is important because network policies can be hard to implement and debug which is partly why they are often overlooked. So, you need an implementation that faithfully follows the spec and does what you expect it to.
We admit we’re a little biased at Vamp, we’ve had good experiences with Calico starting a few years ago when the majority of our customers were on DC/OS. Our experience of Calico on Kubernetes has been just as positive.
Containers are Vulnerable
Kubernetes, the operating systems running on the nodes, Docker, the operating systems running in the containers, the software frameworks and the third party libraries used to implement your services all have vulnerabilities. Lots of vulnerabilities.
One of the most important value adds of using a managed Kubernetes service like GKE is that Google actively patch Kubernetes and the OS used to run the nodes, as well as actively defending against known vulnerabilities. This is not foolproof and in any case the vulnerabilities in your Containers are beyond the scope of what Google can do.
For example, if you use one of the official Node.js Docker images as your base image, there is a big difference in the number of known vulnerabilities depending on which image you choose.
lts image uses the official Debian 9.x "Stretch" Docker image as it's base and has 43 components that have known vulnerabilities, many of them critical. The
lts-buster the official Debian 10.x "Buster" Docker image and has 25 components that have known vulnerabilities, 14 of them critical. Whereas the
lts-alpine image uses the much slimmer, security focused Alpine Linux as it's base and has no components with known vulnerabilities.
The rule of thumb is that smaller images generally have fewer vulnerable components because they have fewer components in total but ultimately its not about the numbers, you need defence in depth.
Defence in Depth: The First Step
The first step to leveraging network policies to secure internal cluster traffic is to enable the policy enforcement feature when creating a cluster.
To enable network policy enforcement when creating a cluster using the
gcloud CLI , simply add
You can also enable it under the network security options using the GKE console.
If you want to enable network policy enforcement for an existing cluster, we recommend using
gcloud. You can use the GKE console but it is a messy process.
Note: the second command will trigger GKE to recreate all the node pools in your cluster to ensure they are correctly configured.
Tip: the easiest way to verify that network policies are enabled, is either to use
kubectl to describe one of your existing Pods, or to use the GKE console to view the Pod’s YAML. Kubernetes doesn’t warn you that network policies are not enabled, it accepts the configuration and silently ignores it.
If network policies are enabled, you will see a
Still Access All Areas
The Pods in a Kubernetes cluster are not isolated by default, which means that any Pod in any namespace is free to access any other Pod, in any other namespace. Pods are also free to establish connections to practically anywhere on the Internet.
Relying on stopping threats at the cluster edge is like relying on checking festival goers tickets when they arrive and then relying on their “good character” not to storm the backstage areas. It’s going to end in a mess.
Block Everything but DNS
The first thing you want to do is set a default deny (almost) all egress policy on your Namespaces. It is harsh but effective. The policy should allow DNS, without it the network will not function properly.
# allow DNS resolution
- port: 53
- port: 53
Your Pods will be able to respond to requests won’t be able to initiate connections. So, if a bad actor does hijack your containers, it will be that much harder for them to harvest your data or to use your cluster to do things like anonymously buy Facebook advertising.
Tip: BusyBox is a huge help when testing policies.
Make a Hole
Just like with a firewall, you can then allow egress to specific destinations on a case by case basis. In this case, allowing the Vamp Release Agent in the
customer1 Namespace to connect to Elasticsearch in the
The egress policy allows any Pod with a
io.vamp=release-agent label to connect to any Namespace with a
role=core label using TCP port 9200 only.
- protocol: TCP
Using BusyBox we can now successfully connect to Elasticsearch in the
Check Both Ends
Next, for each of the Services running in your cluster, set individual policies to control which Namespaces can access the Pods behind each of those services, using a
namespaceSelector. A common use case for this is when your application has dependencies on services like Elasticsearch or Redis that run in the same cluster as your Services.
- protocol: TCP
Ingress network policies are written from the perspective of the Pods that are being protected. In this case, the policy restricts which Pods can connect to the Elasticsearch Pods in the
The Elasticsearch Pods are identified as being any Pod with a
app=elasticsearch label and they are only allowed to accept requests from Pods in namespaces with a
role=tenant label. And only on TCP port 9200.
We can test the effect of this policy by creating a new Namespace without the
role=tenant label and running our BusyBox spider:
As a safeguard, it’s a good idea to safeguard against the corresponding egress being accidentally removed or misconfigured by enforcing similar restrictions at both the egress and ingress ends of a connection.
You can further lock down access by defining which Pods within a Namespace can have access, this is done by pairing a
podSelector with the
namespaceSelector. This is also useful when you have multiple Services in a Namespace, some of which need access and some of which don't.
- protocol: TCP
The updated policy now restricts the Elasticsearch Pods to only accepting requests on TCP port 9200 from Pods that have a
io.vamp=release-agent label and that are running in namespaces that have a
One of the reasons people shy away from implementing network policies is that troubleshooting network configuration issues is a daunting challenge to most developers.
Fortunately, troubleshooting network policies became a whole lot easier when GKE introduced the option to enable intranode visibility for a cluster. It sounds dull but don’t be fooled, what exposing your intranode Pod-to-Pod traffic to the GCP networking fabric means is that you can see logs of the network flows between Pods (e.g. VPC flow logging).
You can enable Intranode Visibility for an existing cluster using the
Note: this command will trigger GKE to restart all the masters and all the nodes in your cluster, so it makes sense to enable intranode visibility and network policies in the same command.
Tip: by default the logs are aggregated every 5 seconds. This is great when you are troubleshooting but for a small cluster that can easily result in 20Gb of logs per day at a cost of 0.50 USD/Gb. So, when you not actively troubleshooting, it makes sense to increase the aggregation period to 30 seconds or 1 minute. This will reduce your costs by 80–90%.
Filtering the Logs
To simplify checking the logs, you need to know either the source or destination IP address and preferably the destination port number. The port number is useful when you want to filter out DNS (port 53) requests, etc.
In this example, we are interested in requests from BusyBox (pod2–10.4.1.11) to Elasticsearch (10.4.2.12) on port 9200.
Note: the IP address shown in the examples above (10.0.25.32) is the IP address of the
elasticsearch Service, not the Pod. Only the Pod overlay network is exposed, the requests to the Services are not logged.
You can view the logs using the GKE Stackdriver logs viewer console page or using the
gcloud CLI. The console page is the easier option when diagnosing issues:
- Start by selecting the GCE Subnetwork for your cluster using the left most dropdown menu and then selct just the
- You can then filter by
jsonPayload.connection.dest_portif you know it.
Tip: it is important to type in the filter box and use the auto completion. Pasting into the filter box often results in a text query, like
text:jsonPayload.connection.src_ip:10.4.1.11 which won't return any results.
We have deep experience of Project Calico and of Kubernetes network policies on self-managed clusters but we were wary of implementing them on enforce layer-3 segmentation on managed services like GKE. The main reason for this was the lack root access to the nodes. Our experience was that you needed to be able to tap into the various real and overlay networks to be able to diagnose issue. The GKE VPC flow logs changed that.
The most frequent mistake we see occurs when a Service is exposed on a different port to the Container target port. Policies operate at the Pod-level only. So, if your Service is exposed on port 80 but the corresponding Container uses port 8080, you must use port 8080 in the network policy.
Another common mistake we see occurs when a network policy uses a
podSelector that is subtly different to the
selector used for the corresponding Service. This can lead to ugly situations when releasing a new version of a microservice. For example, when the new Deployment is labelled slightly differently the Pods for the new version may match the Service's
selector and be added to it's load balancer only to fail because the requests routed to those Pods are blocked by the network policy.
You can find out more about Vamp and our policy-based, data-driven approach to Service Reliability Engineering at vamp.io.