AKS with Calico Network Policies
Using Calico Network Policy with Azure Kubernetes Server
Network policies in Kubernetes are essentially firewalls for pods. By default, pods are accessible from anywhere with no protections. If you like to to use network policies, you’ll need to install a network plugin that supports this feature. This article will demonstrate how to use this feature with Calico network plugin on AKS.
Default Kubenet network plugin
The default network plugin for AKS as well as many other Kubernetes implementations is kubenet:
a very basic, simple network plugin, on Linux only. It does not, of itself, implement more advanced features like cross-node networking or network policy (ref)
Though the kubenet network plugin is limited, it affords some security as pods are on an overlay network behind a NAT. Thus, the pod cannot be reached from outside the cluster, unless the operator (you) configure a service resource (NodePort
or LoadBalancer
) or an ingress resource.
About the Diagram: With kubenet, traffic that is headed toward pods running on the same worker node, then it will just go to the local bridge cbr0
to reach the target pod. If the destination is to a pod on another worker node, then traffic is sent outside of the node with IP forwarding, and using Azure UDR (user defined routes), will land on the correct worker node and then sent ownward to that node’s local bridge cbr0
.
Securing internal cluster traffic
Why do we need such a granular control of network packets between containers? (ref. colleague)
Defining network policy allows you to enable things like defense in depth when serving a multi-level application. Essentially, any reasons were you may want to restrict access to services traditional virtual machines would be the same you would restrict access to services running on containers.
Should a malicious or errant entity breach the top layer, such as a container providing a web service, network policies could restrict what further access or layers can be reached.
Here are some obvious scenarios where network policies could be used:
- restrict backend sensitive or critical services running in pods such as databases, secrets vaults, administration, etc.
- restrict traffic between single tenants, so that pods with services containing customer tenants are not able to communicate with other customer tenants.
- restrict external traffic from outside the Kubernetes cluster, especially if the network plugin parks the pods on the same subnet as other virtual machines.
- restrict outbound traffic where services should not be talking to unknown services, especially on the public Internet.
Azure network plugin
Azure provides a more robust plugin called Azure CNI that will park the pods on the same subnet as virtual machines. This is useful for other virtual machines to reach the Kubernetes pods without going through a Kubernetes service.
The obvious danger here is that all pods are now exposed to external traffic outside of the Kubernetes cluster, so network policies definitely become vital in this scenario.
NOTE: The Azure CNI network plugin comes with a limited subset of network policy features and depends on underlying Azure network infrastructure. With Calico, you can enjoy the full set of network policy features without dependency on Azure network infrastructure.
Overview of Tutorial
This article covers how to use network policies in a real world scenario with a sensitive and critical service: a highly performant distributed graph database called Dgraph.
The article will run through these tests or steps:
- Deploy distributed graph database Dgraph and Python client pod(s) in separate namespaces
- Deploy a network policy that block off all traffic except from pods within the same namespace or from pods in a namespace with the required labels
- Add required labels on the namespace containing the client pod to allow access to database services (Dgraph)
Articles in Series
This series shows how to both secure and load balance gRPC and HTTP traffic.
- AKS with Azure Container Registry
- AKS with Calico network policies (this article)
- AKS with Linkerd service mesh
- AKS with Istio service mesh
Previous Article
In a previous article, I documented how to build and push container images to Azure Container Registry. Portions of this article will be reused to demonstrate HTTP and gRPC connectivity when services are blocked.
Requirements
For creation of Azure cloud resources, you will need to have a subscription that will allow you to create resources.
Required tools
These tools are required for this article:
- Azure CLI tool (
az
): command line tool that interacts with Azure API. - Kubernetes client tool (
kubectl
): command line tool that interacts with Kubernetes API - Helm (
helm
): command line tool for “templating and sharing Kubernetes manifests” (ref) that are bundled as Helm chart packages. - helm-diff plugin: allows you to see the changes made with
helm
orhelmfile
before applying the changes. - Helmfile (
helmfile
): command line tool that uses a “declarative specification for deploying Helm charts across many environments” (ref). - Docker (
docker
): command line tool to build, test, and push docker images.
Optional Tools
As most of the tools to interact with gRPC or HTTP are included in the Docker image, only shell is recommend to manage environment variables and run scripts:
- POSIX shell (
sh
) such as GNU Bash (bash
) or Zsh (zsh
): these scripts in this guide were tested using either of these shells on macOS and Ubuntu Linux.
Project setup
Below is a file structure that will be used for this article:
~/azure_calico/
├── env.sh
└── examples
├── dgraph
│ ├── helmfile.yaml
│ └── network_policy.yaml
└── pydgraph
├── Dockerfile
├── Makefile
├── helmfile.yaml
├── requirements.txt
├── load_data.py
├── sw.nquads.rdf
└── sw.schema
With either Bash or Zsh, you can create the file structure with the following commands:
mkdir -p ~/azure_calico/examples/{dgraph,pydgraph}
cd ~/azure_calico
touch \
env.sh \
./examples/dgraph/network_policy.yaml \
./examples/{dgraph,pydgraph}/helmfile.yaml \
./examples/pydgraph/{Dockerfile,Makefile,requirements.txt} \
./examples/pydgraph/{load_data.py,sw.schema,sw.nquads.rdf}
Project environment variables
Setup these environment variables below to keep a consistent environment amongst different tools used in this article. If you are using a POSIX shell, you can save these into a script and source that script whenever needed.
Copy this source script and save as env.sh
:
Provision Azure resources
Both AKS with Azure CNI and Calico network policies and ACR cloud resources can be provisioned with the following steps outlined in the script below:
Verify AKS and KUBCONFIG
Verify that the AKS cluster was created and that you have a KUBCONFIG that is authorized to access the cluster by running the following:
source env.sh
kubectl get all --all-namespaces
The results should look similar to the following:
NOTE: Recent changes (2021 Aug 01) have moved Calico components into their own namespace.
For the nodes themselves, you can gleam information, such as the node name and IP address with:
JP='{range .items[*]}{@.metadata.name}{"\t"}{@.status.addresses[?(@.type == "InternalIP")].address}{"\n"}{end}'kubectl get nodes --output jsonpath="$JP"
This will show the worker nodes and their IP address on Azure VNET subnet:
aks-nodepool1-56788426-vmss000000 10.240.0.4
aks-nodepool1-56788426-vmss000001 10.240.0.35
aks-nodepool1-56788426-vmss000002 10.240.0.66
The Dgraph service
Dgraph is a distributed graph database that can be installed with these steps below.
Save the following as examples/dgraph/helmfile.yaml
:
Now run this below to deploy the Dgraph service:
source env.sh
helmfile --file examples/dgraph/helmfile.yaml apply
When completed, it will take about 2 minutes for the Dgraph cluster to be ready. You can check it with:
kubectl --namespace dgraph get all
This should show something like the following:
Interestingly, you can peak at the IP addresses allocated to the pods and not that they are the Azure VNET subnet, the same subnet used by virtual machines and shared with Kubernetes work nodes:
JP='{range .items[*]}{@.metadata.name}{"\t"}{@.status.podIP}{"\n"}{end}'kubectl get pods --output jsonpath="$JP"
In this implementation, this shows:
demo-dgraph-alpha-0 10.240.0.74
demo-dgraph-alpha-1 10.240.0.30
demo-dgraph-alpha-2 10.240.0.49
demo-dgraph-zero-0 10.240.0.24
demo-dgraph-zero-1 10.240.0.38
demo-dgraph-zero-2 10.240.0.81
The pydgraph client
In the previous blog, I documented steps to build and release a pygraph-client
image, and then deploy a container using that image.
Fetch build and deploy scripts
Below is a script you can use to download the gists and populate the needed files run through these steps.
NOTE: These scripts and further details are covered in the previous article (see AKS with Azure Container Registry).
Build, push, and deploy the pydgraph client
Now that all the required source files are available, build the image:
source env.sh
az acr login --name ${AZ_ACR_NAME}
pushd ~/azure_calico/examples/pydgraphmake build && make push
helmfile applypopd
After running kubectl get all -n pydgraph-client
, this should result in something like the following:
Log into the pydgraph-client container
For the next three tests, you will need to log into the container. This can be done with the following.
PYDGRAPH_POD=$(kubectl get pods \
--namespace pydgraph-client \
--output name
)kubectl exec -ti --namespace pydgraph-client ${PYDGRAPH_POD} -- bash
Test 0 (Baseline): No Network Policy
Conduct a basic check to verify that the things are working before running any tests with network policies.
In this sanity check and proceeding tests, both HTTP (port 8080
) and gRPC (port 9080
) will be tested.
HTTP check (no network policy)
Log into the pydgraph-client
pod and run this command:
curl ${DGRAPH_ALPHA_SERVER}:8080/health | jq
The expected results should be health status of one of the Dgraph Alpha nodes:
gRPC check (no network policy)
Log into the pydgraph-client
pod and run this command:
grpcurl -plaintext -proto api.proto \
${DGRAPH_ALPHA_SERVER}:9080 \
api.Dgraph/CheckVersion
The expected results will be the Dgraph server version.
TEST 1: Apply a network policy
In this test, you will add a network policy that denies all traffic, unless the pods come from a namespace that correct label. The acceptance criteria for this test the client will not be able to connect to the Dgraph service.
Adding a network policy
This policy will deny all traffic to the Dgraph Alpha pods, except for traffic from within the same namespace or traffic from pods in namespaces with labels of app=dgraph-client
and env=test
.
Copy the following and say as examples/dgraph/network_policy.yaml
:
When ready, apply this with:
kubectl --filename ./examples/dgraph/network_policy.yaml apply
HTTP check (network policy applied)
Log into the pydgraph-client
pod, and run this command:
curl ${DGRAPH_ALPHA_SERVER}:8080/health
The expected results in this case, after a very long wait (about 5 minutes), the result will be a time out:
gRPC check (network policy apply)
Log into the pydgraph-client
pod and run this command:
grpcurl -plaintext -proto api.proto \
${DGRAPH_ALPHA_SERVER}:9080 \
api.Dgraph/CheckVersion
The expected results for gRPC in about 10 seconds will be:
TEST 2: Allow traffic to the service
Now that we demonstrated that connectivity is walled off from access, we can add the appropriate labels to the namespace so that traffic will be permitted.
Allow a client to access Dgraph
kubectl label namespaces pydgraph-client env=test app=dgraph-client
After this command new labels will be added:
HTTP check (namespace label applied)
Log into the pydgraph-client
pod, and run this command:
curl ${DGRAPH_ALPHA_SERVER}:8080/health | jq
The expected results for this is that JSON data about the health from one of the Dgraph Alpha pods.
gRPC check (namespace label applied)
Log into the pydgraph-client
pod and run this command:
grpcurl -plaintext -proto api.proto \
${DGRAPH_ALPHA_SERVER}:9080 \
api.Dgraph/CheckVersion
The expected results for this is that JSON detailing the Dgraph server version.
Cleanup
This will remove the AKS cluster as well as any provisioned resources from AKS including external volumes created through the Dgraph deployment.
az aks delete \
--resource-group $AZ_RESOURCE_GROUP \
--name $AZ_CLUSTER_NAME
Resources
These are some of the resources I have come across online when researching this article.
Blog source code
- AKS with Azure CNI and Calico Network Policies: https://github.com/darkn3rd/blog_tutorials/tree/master/kubernetes/aks/series_2_network_mgmnt/part_2_calico
Network policy tools
- Online graphical net policy editor: https://editor.cilium.io/
Network policy articles
- Guide to Kubernetes Ingress Network Policies: https://www.openshift.com/blog/guide-to-kubernetes-ingress-network-policies
- Securing Kubernetes Cluster Networking: https://ahmet.im/blog/kubernetes-network-policy/
- Get started with Kubernetes network policy: https://docs.projectcalico.org/security/kubernetes-network-policy
Videos
- Kubernetes networking on Azure: https://youtu.be/JyLtg_SJ1lo
Documentation
- Network Policies: https://kubernetes.io/docs/concepts/services-networking/network-policies/
- Network Plugins: https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/network-plugins/
- Container Network Interface: https://github.com/containernetworking/cni
Network plugins supporting network policies
- Calico: https://www.tigera.io/project-calico/
- Azure CNI: https://github.com/Azure/azure-container-networking
- Cilium: https://cilium.io/
- Weave Net: https://www.weave.works/docs/net/latest/overview/
- Antrea: https://antrea.io/
Conclusion
Security is a vital and necessary part of infrastructure and with the introduction of rich container orchestration platforms, security doesn’t go away, for it is still needed at the the platform layer (Kubernetes) as well as the infrastructure layer (Azure).
In this tutorial, security is important for the backend distributed graph database Dgraph. Only designated clients and automation that manages operational aspects of Dgraph, such as backups and live loading, should be permitted, while everything else is denied access.
Beyond network policies
Beyond network policies to restrict access to Kubernetes pods, the traffic between pods should be secured, which is called encryption in transit.
This article is an import step toward exploring traffic security toward with service meshes, which can automate configuring mutual TLS for short lived ephemeral pods.
Thank you for reading my article. I hope this useful in your Kubernetes journeys and I wish you the best of success.