A beginner’s journey to Kubernetes security
Come with me on my career journey as I learn all about Kubernetes security.
If you’re reading this blog then I’m assuming you have some knowledge of Kubernetes. If not, and you’re interested in learning, I not only learnt a lot but also enjoyed this course on Udemy (not affiliated in any way — I just found this course extremely engaging and helpful!).
I’d like to say I wasn’t a complete stranger to security prior to accepting a position at Tigera. I follow the r/scams subreddit and had read The Perfect Weapon but in hindsight I was totally ignorant of how Kubernetes clusters could be compromised, and what security measures can be put in place. After my technical interview task, I was surprised at how easy implementing some basic security is.
With this blog I’m here to make Kubernetes and security a little more accessible. I wrote this with the intention to help users who are new to Kubernetes security. Perhaps, like me, you’ve never really considered Kubernetes security before. Perhaps you’ve been asked to secure your cluster and you’re not sure what the first steps that you need to take are. Or perhaps you’re already familiar with the concepts of zero-trust and microsegmentation and you want to know how to protect your cluster from malicious traffic.
In this blog I’ll give you an introduction to Kubernetes networking and security policies, why they are used, policy examples (and what I did wrong/didn’t consider) and how you can easily (and for free) implement them in your own Kubernetes clusters!
Disclaimer: I do work for Tigera so this content will be biased towards Calico, as that’s what I am using.
The Scenario
I was still working at Safe Software as a Technical Support Lead for Cloud and Containers when I was interviewing for Tigera. During this time, I coincidentally had my first security support case come in from a user who was using Open Service Mesh (OSM) in their cluster, and FME Flow (the product I was supporting) was not working correctly.
For context, FME Flow is an enterprise tool that lets you automate ETL tasks. In 2019, the traditional installation was adapted for Docker and Kubernetes and the different components of FME Flow were containerized. OSM allows you to manage, secure, and get out-of-the box observability features for highly dynamic microservice environments.
This was the first time I had to evaluate FME Flow’s pod to pod communication to troubleshoot and reproduce where the communication was breaking down (spoiler alert: OSM had missed a port pool). It was also my first security related question in about 3 years of supporting FME Flow on Kubernetes (excluding customer reported CVEs from image scanning). With my newfound exposure to network policies (thanks Calico), I undertook the challenge of writing policies to control and test communication between FME Flow pods. And while I say challenge, if you are already comfortable with YAML and can plan a sensible roll out of policies (so you can test iteratively) it is not that difficult.
But first:
What are network and security policies for Kubernetes and containers?
In Kubernetes all pods are allowed to communicate with each other by default — it’s a very flat network. If one application has vulnerabilities or has been compromised, then everything else in that cluster is at risk. Network policies can restrict traffic between pods, reducing the risk that if one application is compromised, a bad actor will not be able to travel to other pods within that cluster (also known as microsegmentation). And if you’re thinking that it’s unlikely that your application will be a target, the concept of zero trust can also protect from human error within an organization or social engineering.
Kubernetes environments are designed to be dynamic and ephemeral. How can you write traditional rules to protect your pods or allow traffic when at any moment they may or may not exist? If a pod or node is scaled, terminated or recreated, can you guarantee your rules will always be targeting the correct pod(s) or node(s) if the IP address has changed?
The Kubernetes Network Policy API solves that problem by supporting namespaces, label selectors, CIDRs, a few protocols and named port numbers. However, Kubernetes doesn’t enforce these policies itself, and instead delegates these to a Container Network Interface, or CNI (What is CNI?).
Tigera is the creator and maintainer of Calico Open Source, and because this is where I was interviewing (and now work) I chose this as my CNI.
Each CNI can add functionality on top of what the Kubernetes default is. As an example, Calico Network Policy supports added features, such as applying policies to any kind of Kubernetes endpoint, ingress and/or egress rules, actions (allow, deny, log, pass) and source and destination match criteria.
Creating policies
Knowing what ports to allow and between which components of FME Flow was the easy part. For traditional installations there was a list of services and the ports FME communicates over so all I had to do was match them up with the right pod.
Before I show you some of the network policies that I created I want to mention a couple of things.
If you want to do this for yourself, know that when no network policies are applied to a pod then all traffic is allowed (default allow). If you apply a policy, then all traffic will be denied to that pod unless specifically allowed.
This leads into my next point: have a plan for how to test and deploy your policies. If we take FME Flow as an example, which is made up of multiple pods/services that all communicate with each other, you don’t want to start by allowing traffic to your database first and not the UI that the clients will interact with, because you won’t be able to easily test and verify that your policies are working (more on that in a future blog).
FME Flow Example
Having worked with FME for 10 years, I’m intimately familiar with how the FME Flow components work and communicate, so I’ll use it as my example here. This could translate to any Kubernetes application with multiple services.
As mentioned above, as soon as any policy is applied to a pod then traffic is denied. Knowing that, I always check that my application works correctly before applying any kind of policy. To confirm that policies are being applied I started with a default-deny.yaml policy, denying any ingress or egress traffic to the fmeserver namespace. This stops any communication coming into or out of any pods or endpoints within the fmeserver namespace.
apiVersion: projectcalico.org/v3
kind: NetworkPolicy
metadata:
name: default-deny-fmeserver
namespace: fmeserver
spec:
order: 100
selector: all()
types:
- Ingress
- Egress
If you apply the above policy in your own environment and try to access any component of FME Flow, you won’t be able to. You have to write policies to allow all of the necessary communication.
I’ll show you the policy that was applied to the queue-fmeserver pod, which is basically a Redis container for the FME Flow job queues:
apiVersion: projectcalico.org/v3
kind: NetworkPolicy
metadata:
name: queue-fmeserver
namespace: fmeserver # NetworkPolicies are namespaced, this is the namespace where the policy will apply
spec:
order: 0
selector: statefulset.kubernetes.io/pod-name contains 'queue' # What these rules apply to, in this case the Queue Pod
ingress: # inbound traffic
- action: Allow
protocol: TCP
source:
selector: statefulset.kubernetes.io/pod-name contains 'core' # What source pod(s) are allowed, in this case any traffic from core pods
destination:
ports: [6379] #inbound default Redis port
egress: # outbound traffic
- action: Deny
One of the benefits of YAML is that it’s easily readable, so it’s quite easy to see what’s going on here (plus I actually remembered to add comments to my policies).
How the policy is constructed:
metadata.name: The policy name is crucial, and make sure it’s unique. I cannot confirm nor deny if I’ve accidentally forgotten to change the policy name when copying and pasting and spent too much time troubleshooting why my policies weren’t working.
metadata.namespace: Make sure the policy applies to the correct namespace.
spec.selector: Make sure the policy is applied to the correct pods by using label selectors. By doing this, if the application scales this policy will apply to all of the correct pods.
spec.ingress: Here you can see the rules allowing traffic into any queue pods.
spec.egress: Here you can see the rules denying traffic out of any queue pods. In this case the queue/Redis pod shouldn’t be initiating communication with any other pod or service, hence egress is denied.
Let’s talk about the FME Engine policy
For another example, here you can see the policy I created for the FME Engine. This is the pod that does all of the data processing. One FME Engine can process one ETL job at a time. However, don’t just copy this policy for your own environment, and I’ll tell you why.
apiVersion: projectcalico.org/v3
kind: NetworkPolicy
metadata:
name: engine-fmeserver
namespace: fmeserver # NetworkPolicies are namespaced, this is the namespace where the policy will apply
spec:
order: 0
selector: safe.k8s.fmeserver.component contains 'engine' # What these rules apply to, in this case the engine pod (fme engine containers)
ingress:
- action: Allow
protocol: TCP
source:
selector: statefulset.kubernetes.io/pod-name contains 'core'
destination:
ports: [7500, 7501, '4500:4800'] #https://docs.safe.com/fme/html/FME-Flow/ReferenceManual/FME-Flow-Ports.htm
egress:
- action: Allow
My selector for this policy is set to apply to all engine pods. However, in FME you can create multiple engine deployments with different properties to enable queue control (so a specific number or subset of engines performs certain jobs). I’ve seen FME Flow users set up different engine deployments for different business processes and systems. If you’re security conscious and serious about zero trust you will want to create multiple engine policies.
For example, one engine deployment may be configured to only process jobs that are reading/writing data to a file share and a PostGIS database. That engine should only have egress (outbound) access on ports 445 and 5432, and not just “Allow”. I know this now — but I didn’t realize the importance of egress two months ago. You can also specify destination IP addresses which may applicable if you’re connecting to resources or services with a static IP.
Why? If you allow all egress traffic then a bad actor has the potential to exfiltrate your data, communicate with their command and control centre, or continue to traverse you cluster and network and look for more valuable targets. If (in this scenario) someone accesses your engine container they may be able to access your PostGIS instance or your file share, but they have no way of exfiltrating that data (from that pod) because you’ve denied egress traffic except to known and trusted destinations. They’ll have to find another weak link.
When creating policies, you want to make sure you understand what your services will need to communicate with and create policies with the right selectors and ingress/egress rules.
Why are policies important?
If you are deploying your applications on-premises or on VMs and using firewalls — why aren’t you protecting your Kubernetes applications?
Securing the perimeter of your network or applications is a good first line of defense but you cannot rely on it alone. By combining security methods, you improve your security posture and reduce the damage if a container or cluster is compromised.
If you’ve ever played Age of Empires, do you just build a wall around the outside of your base and call it a day? Probably not, unless you lose a lot. You’re likely building defensive units and structures inside your base, protecting valuable assets (monastry? market?), upgrading your castle/towers, and putting archers in them. If you keep getting breached, you’re going to lose resources and units fighting them off which would’ve been better spent on upgrades and attacking.
However, losing at Age of Empires because your base got destroyed is not going to be as disastrous, expensive, or damaging as getting hacked.
Wrap up
If you’re newer to Kubernetes, container security, or both — I hope this was an informative, introductory read and you feel confident in how you can begin to secure your own Kubernetes workloads with policies and understand why that’s important.
If you want to improve your security posture I encourage you to check out Project Calico functionality, or explore our commercial offerings.
If you want to try this for yourself, check out the Calico installation and best practices lab (this is what I used before my interview).
Stay connected with me on here, X or LinkedIn (or Steam for a game of AoE4?) for updates about Calico and more introductory security content!