AKS with Calico Network Policies

Using Calico Network Policy with Azure Kubernetes Server

Joaquín Menchaca (智裕)
Geek Culture
Published in
10 min readJul 27, 2021

--

Network policies in Kubernetes are essentially firewalls for pods. By default, pods are accessible from anywhere with no protections. If you like to to use network policies, you’ll need to install a network plugin that supports this feature. This article will demonstrate how to use this feature with Calico network plugin on AKS.

Default Kubenet network plugin

The default network plugin for AKS as well as many other Kubernetes implementations is kubenet:

a very basic, simple network plugin, on Linux only. It does not, of itself, implement more advanced features like cross-node networking or network policy (ref)

Though the kubenet network plugin is limited, it affords some security as pods are on an overlay network behind a NAT. Thus, the pod cannot be reached from outside the cluster, unless the operator (you) configure a service resource (NodePort or LoadBalancer) or an ingress resource.

About the Diagram: With kubenet, traffic that is headed toward pods running on the same worker node, then it will just go to the local bridge cbr0 to reach the target pod. If the destination is to a pod on another worker node, then traffic is sent outside of the node with IP forwarding, and using Azure UDR (user defined routes), will land on the correct worker node and then sent ownward to that node’s local bridge cbr0.

Securing internal cluster traffic

Why do we need such a granular control of network packets between containers? (ref. colleague)

Defining network policy allows you to enable things like defense in depth when serving a multi-level application. Essentially, any reasons were you may want to restrict access to services traditional virtual machines would be the same you would restrict access to services running on containers.

Should a malicious or errant entity breach the top layer, such as a container providing a web service, network policies could restrict what further access or layers can be reached.

Here are some obvious scenarios where network policies could be used:

  • restrict backend sensitive or critical services running in pods such as databases, secrets vaults, administration, etc.
  • restrict traffic between single tenants, so that pods with services containing customer tenants are not able to communicate with other customer tenants.
  • restrict external traffic from outside the Kubernetes cluster, especially if the network plugin parks the pods on the same subnet as other virtual machines.
  • restrict outbound traffic where services should not be talking to unknown services, especially on the public Internet.

Azure network plugin

Azure provides a more robust plugin called Azure CNI that will park the pods on the same subnet as virtual machines. This is useful for other virtual machines to reach the Kubernetes pods without going through a Kubernetes service.

The obvious danger here is that all pods are now exposed to external traffic outside of the Kubernetes cluster, so network policies definitely become vital in this scenario.

NOTE: The Azure CNI network plugin comes with a limited subset of network policy features and depends on underlying Azure network infrastructure. With Calico, you can enjoy the full set of network policy features without dependency on Azure network infrastructure.

Overview of Tutorial

This article covers how to use network policies in a real world scenario with a sensitive and critical service: a highly performant distributed graph database called Dgraph.

The article will run through these tests or steps:

  1. Deploy distributed graph database Dgraph and Python client pod(s) in separate namespaces
  2. Deploy a network policy that block off all traffic except from pods within the same namespace or from pods in a namespace with the required labels
  3. Add required labels on the namespace containing the client pod to allow access to database services (Dgraph)

Articles in Series

This series shows how to both secure and load balance gRPC and HTTP traffic.

  1. AKS with Azure Container Registry
  2. AKS with Calico network policies (this article)
  3. AKS with Linkerd service mesh
  4. AKS with Istio service mesh

Previous Article

In a previous article, I documented how to build and push container images to Azure Container Registry. Portions of this article will be reused to demonstrate HTTP and gRPC connectivity when services are blocked.

Requirements

For creation of Azure cloud resources, you will need to have a subscription that will allow you to create resources.

Required tools

These tools are required for this article:

  • Azure CLI tool (az): command line tool that interacts with Azure API.
  • Kubernetes client tool (kubectl): command line tool that interacts with Kubernetes API
  • Helm (helm): command line tool for “templating and sharing Kubernetes manifests” (ref) that are bundled as Helm chart packages.
  • helm-diff plugin: allows you to see the changes made with helm or helmfile before applying the changes.
  • Helmfile (helmfile): command line tool that uses a “declarative specification for deploying Helm charts across many environments” (ref).
  • Docker (docker): command line tool to build, test, and push docker images.

Optional Tools

As most of the tools to interact with gRPC or HTTP are included in the Docker image, only shell is recommend to manage environment variables and run scripts:

  • POSIX shell (sh) such as GNU Bash (bash) or Zsh (zsh): these scripts in this guide were tested using either of these shells on macOS and Ubuntu Linux.

Project setup

Below is a file structure that will be used for this article:

~/azure_calico/
├── env.sh
└── examples
├── dgraph
│ ├── helmfile.yaml
│ └── network_policy.yaml
└── pydgraph
├── Dockerfile
├── Makefile
├── helmfile.yaml
├── requirements.txt
├── load_data.py
├── sw.nquads.rdf
└── sw.schema

With either Bash or Zsh, you can create the file structure with the following commands:

mkdir -p ~/azure_calico/examples/{dgraph,pydgraph}
cd ~/azure_calico

touch \
env.sh \
./examples/dgraph/network_policy.yaml \
./examples/{dgraph,pydgraph}/helmfile.yaml \
./examples/pydgraph/{Dockerfile,Makefile,requirements.txt} \
./examples/pydgraph/{load_data.py,sw.schema,sw.nquads.rdf}

Project environment variables

Setup these environment variables below to keep a consistent environment amongst different tools used in this article. If you are using a POSIX shell, you can save these into a script and source that script whenever needed.

Copy this source script and save as env.sh:

Provision Azure resources

Azure Resources

Both AKS with Azure CNI and Calico network policies and ACR cloud resources can be provisioned with the following steps outlined in the script below:

Verify AKS and KUBCONFIG

Verify that the AKS cluster was created and that you have a KUBCONFIG that is authorized to access the cluster by running the following:

source env.sh
kubectl get all --all-namespaces

The results should look similar to the following:

AKS with Calico (before 2021-Aug-01)

NOTE: Recent changes (2021 Aug 01) have moved Calico components into their own namespace.

For the nodes themselves, you can gleam information, such as the node name and IP address with:

JP='{range .items[*]}{@.metadata.name}{"\t"}{@.status.addresses[?(@.type == "InternalIP")].address}{"\n"}{end}'kubectl get nodes --output jsonpath="$JP"

This will show the worker nodes and their IP address on Azure VNET subnet:

aks-nodepool1-56788426-vmss000000       10.240.0.4
aks-nodepool1-56788426-vmss000001 10.240.0.35
aks-nodepool1-56788426-vmss000002 10.240.0.66

The Dgraph service

Dgraph is a distributed graph database that can be installed with these steps below.

Save the following as examples/dgraph/helmfile.yaml:

Now run this below to deploy the Dgraph service:

source env.sh
helmfile --file examples/dgraph/helmfile.yaml apply

When completed, it will take about 2 minutes for the Dgraph cluster to be ready. You can check it with:

kubectl --namespace dgraph get all

This should show something like the following:

Dgraph deployment

Interestingly, you can peak at the IP addresses allocated to the pods and not that they are the Azure VNET subnet, the same subnet used by virtual machines and shared with Kubernetes work nodes:

JP='{range .items[*]}{@.metadata.name}{"\t"}{@.status.podIP}{"\n"}{end}'kubectl get pods --output jsonpath="$JP"

In this implementation, this shows:

demo-dgraph-alpha-0      10.240.0.74
demo-dgraph-alpha-1 10.240.0.30
demo-dgraph-alpha-2 10.240.0.49
demo-dgraph-zero-0 10.240.0.24
demo-dgraph-zero-1 10.240.0.38
demo-dgraph-zero-2 10.240.0.81

The pydgraph client

In the previous blog, I documented steps to build and release a pygraph-client image, and then deploy a container using that image.

Fetch build and deploy scripts

Below is a script you can use to download the gists and populate the needed files run through these steps.

NOTE: These scripts and further details are covered in the previous article (see AKS with Azure Container Registry).

Build, push, and deploy the pydgraph client

Now that all the required source files are available, build the image:

source env.sh
az acr login --name ${AZ_ACR_NAME}
pushd ~/azure_calico/examples/pydgraph
make build && make push
helmfile
apply
popd

After running kubectl get all -n pydgraph-client, this should result in something like the following:

pydgraph-client deployment

Log into the pydgraph-client container

For the next three tests, you will need to log into the container. This can be done with the following.

PYDGRAPH_POD=$(kubectl get pods \
--namespace pydgraph-client \
--output name
)
kubectl exec -ti --namespace pydgraph-client ${PYDGRAPH_POD} -- bash

Test 0 (Baseline): No Network Policy

Conduct a basic check to verify that the things are working before running any tests with network policies.

In this sanity check and proceeding tests, both HTTP (port 8080) and gRPC (port 9080) will be tested.

No Network Policy

HTTP check (no network policy)

Log into the pydgraph-client pod and run this command:

curl ${DGRAPH_ALPHA_SERVER}:8080/health | jq

The expected results should be health status of one of the Dgraph Alpha nodes:

/health (HTTP)

gRPC check (no network policy)

Log into the pydgraph-client pod and run this command:

grpcurl -plaintext -proto api.proto \
${DGRAPH_ALPHA_SERVER}:9080 \
api.Dgraph/CheckVersion

The expected results will be the Dgraph server version.

api.Dgraph/CheckVersion (gRPC)

TEST 1: Apply a network policy

In this test, you will add a network policy that denies all traffic, unless the pods come from a namespace that correct label. The acceptance criteria for this test the client will not be able to connect to the Dgraph service.

Network Policy added

Adding a network policy

This policy will deny all traffic to the Dgraph Alpha pods, except for traffic from within the same namespace or traffic from pods in namespaces with labels of app=dgraph-client and env=test.

Dgraph Network Policy (made with https://editor.cilium.io/)

Copy the following and say as examples/dgraph/network_policy.yaml:

When ready, apply this with:

kubectl --filename ./examples/dgraph/network_policy.yaml apply

HTTP check (network policy applied)

Log into the pydgraph-client pod, and run this command:

curl ${DGRAPH_ALPHA_SERVER}:8080/health

The expected results in this case, after a very long wait (about 5 minutes), the result will be a time out:

gRPC check (network policy apply)

Log into the pydgraph-client pod and run this command:

grpcurl -plaintext -proto api.proto \
${DGRAPH_ALPHA_SERVER}:9080 \
api.Dgraph/CheckVersion

The expected results for gRPC in about 10 seconds will be:

api.Dgraph/CheckVersion (gRPC)

TEST 2: Allow traffic to the service

Now that we demonstrated that connectivity is walled off from access, we can add the appropriate labels to the namespace so that traffic will be permitted.

Label app=dgraph-client added

Allow a client to access Dgraph

kubectl label namespaces pydgraph-client env=test app=dgraph-client

After this command new labels will be added:

View of namespace labels (Lens tool https://k8slens.dev/)

HTTP check (namespace label applied)

Log into the pydgraph-client pod, and run this command:

curl ${DGRAPH_ALPHA_SERVER}:8080/health | jq

The expected results for this is that JSON data about the health from one of the Dgraph Alpha pods.

/health (HTTP)

gRPC check (namespace label applied)

Log into the pydgraph-client pod and run this command:

grpcurl -plaintext -proto api.proto \
${DGRAPH_ALPHA_SERVER}:9080 \
api.Dgraph/CheckVersion

The expected results for this is that JSON detailing the Dgraph server version.

api.Dgraph/CheckVersion (gRPC)

Cleanup

This will remove the AKS cluster as well as any provisioned resources from AKS including external volumes created through the Dgraph deployment.

az aks delete \
--resource-group $AZ_RESOURCE_GROUP \
--name $AZ_CLUSTER_NAME

Resources

These are some of the resources I have come across online when researching this article.

Blog source code

Network policy tools

Network policy articles

Videos

Documentation

Network plugins supporting network policies

Conclusion

Security is a vital and necessary part of infrastructure and with the introduction of rich container orchestration platforms, security doesn’t go away, for it is still needed at the the platform layer (Kubernetes) as well as the infrastructure layer (Azure).

In this tutorial, security is important for the backend distributed graph database Dgraph. Only designated clients and automation that manages operational aspects of Dgraph, such as backups and live loading, should be permitted, while everything else is denied access.

Beyond network policies

Beyond network policies to restrict access to Kubernetes pods, the traffic between pods should be secured, which is called encryption in transit.

This article is an import step toward exploring traffic security toward with service meshes, which can automate configuring mutual TLS for short lived ephemeral pods.

Thank you for reading my article. I hope this useful in your Kubernetes journeys and I wish you the best of success.

--

--

Joaquín Menchaca (智裕)
Geek Culture

DevOps/SRE/PlatformEng — k8s, o11y, vault, terraform, ansible