Installing Cilium Service Mesh with the Kubernetes Control Plane externally (illumos)

Tony Norlin
22 min readJun 4, 2023

In this part we will install the components and quickly check out some of the features in the Cilium Service Mesh to get a glimpse of the current state.

Hubble UI, as it can look with an external Control Plane, two clusters and a vm connected to each other — Cluster Mesh

My employer (Conoa) run a weekly meetup channel where we (the tech people) have an opportunity to talk about whatever interests us in the cloud native space.

I was asked if I could arrange a talk during the last Thursday of may about something that have caught my interest and I agreed to do it if I could choose to talk about Cilium (for those that know me, it was perhaps no bigger surprise as I happen to like this project and the folks behind it).

I had almost a week to prepare a demo (mostly based on the https://github.com/isovalent/cilium-grafana-observability-demo and from what I’ve seen in the awesome labs available at https://isovalent.com/resource-library/labs/ — unfortunately there are not so much fresh code in GH as I would like) about the Service Mesh (in Swedish).

Unfortunately, with limited time I was only showing a subset of what I initially had in mind, but at least I managed to put up a demo and the Demo Gods were quite nice to me:

The demo preparations was done in my ordinary environment, which happens to run an external Control Plane (in my port of Kubernetes to illumos), placed in another network segment (VLAN) than the Data Plane.

While I had in mind to run my demos in that environment, I felt that it would perhaps have been counter-intuitive to demonstrate an environment that lacks components that most(?) Kubernetes users would perhaps expect to be there:

$ kubectl get pod -n kube-system        
NAME READY STATUS RESTARTS AGE
cilium-bdvbd 1/1 Running 0 21h
cilium-m6dgf 1/1 Running 0 21h
cilium-operator-78ff8866bf-k98zq 1/1 Running 0 21h
cilium-operator-78ff8866bf-tjvhx 1/1 Running 0 21h
coredns-5f47698cfc-5nkr7 1/1 Running 0 26h
coredns-5f47698cfc-qhnl9 1/1 Running 0 26h
hubble-relay-5447546447-sqcsm 1/1 Running 0 21h
hubble-ui-694cf76f4c-m4fg4 2/2 Running 0 21h

With that in mind it still wouldn’t stop me from at least preparing and testing out the concepts in my “managed” Kubernetes solution, if it works there, it works almost anywhere?

The steps involved will still be the same, except for the BGP configuration (which will be environment specific anyway) where I try to steer traffic more directly between the network zones (VLANs), without passing through other network zones.

The output in Hubble, however, will be a bit different as it will display more components.

Creating a demo environment

Prerequisites

  • Working control plane (see my earlier articles on how to set it up, it still holds up until now in concepts. There are some differences where this article installs components, just skip those parts in the old articles.).
$ kubectl version --output=yaml
clientVersion:
buildDate: "2023-04-14T18:51:06Z"
compiler: gc
gitCommit: dc6760b58d10b77ce10082dcfbdb4c4c9f3d61df
gitTreeState: clean
gitVersion: v1.27.1-1+dc6760b58d10b7
goVersion: go1.20.2
major: "1"
minor: 27+
platform: illumos/amd64
kustomizeVersion: v5.0.1
serverVersion:
buildDate: "2023-04-14T18:52:09Z"
compiler: gc
gitCommit: dc6760b58d10b77ce10082dcfbdb4c4c9f3d61df
gitTreeState: clean
gitVersion: v1.27.1-1+dc6760b58d10b7
goVersion: go1.20.2
major: "1"
minor: 27+
platform: illumos/amd64
  • Worker nodes with Linux (I’ve opted for Ubuntu 22.04 LTS) with Kubernetes v1.27.x and CRI (I’ve chosen CRI-O) set up.
  • BGP
  • Prometheus Operator CRD in order to utilize Service Monitors (to facilitate scraping)
  • Gateway API spec (v0.5.1) CRD

Working control plane — and worker nodes

This (friend links, no paywall) are the concepts for getting the control plane up and running, as well as bringing the worker nodes up.

BGP

This configuration is in no way “production ready” as it has no security, but for demonstration purposes it show how BGP can be implemented and enable it in the cluster.

To explain how the configuration ended up in this way, some background is needed:

In order to have the control plane being able to talk with the internal services I’ve had some challenges:

#1: I looked into on how to integrate with the VXLAN, but I saw no easy way to implement it.
#2: Next up I had the admission controllers listening on host ports and patched the webhooks to talk url to a external load balancer which pointed to each worker node instead of the ordinaryservice, which mostly by the way worked rather good on various projects. I had Longhorn working stable. Then came Longhorn v1.3+ which relied on even more webhooks and I realised that it wouldn’t be realistic to maintain.
#3: VTEP, still VXLAN, but it would at least be a defined state. However good, it would still be a single worker node (if I’ve understood the concept correctly).
#4: Current state. I simply announce the ClusterIP to the kube-apiserver and route it back to the workers. What’s not so good is that each individual worker node needs to be defined.

This is the configuration in its current state:

K8S_CLUSTERIP_CIDR=
K8S_ROUTER_ASN=
K8S_LB_CIDR=
WORKER1_NODE_IP=
WORKER2_NODE_IP=
WORKER3_NODE_IP=
WORKER_NODE_CIDR=
LOCAL_ROUTER_ASN=
LOCAL_ROUTER_ID=
LOCAL_ROUTER_NAME=
UPSTREAM_ROUTER_ASN=
UPSTREAM_ROUTER_ID=

cat << EOF > /etc/frr/frr.conf
frr version 7.5
frr defaults traditional
hostname ${LOCAL_ROUTER_NAME}
log syslog
no ipv6 forwarding
service integrated-vtysh-config
!
ip route ${K8S_CLUSTERIP_CIDR} ${WORKER1_NODE_IP}
ip route ${K8S_CLUSTERIP_CIDR} ${WORKER2_NODE_IP}
ip route ${K8S_CLUSTERIP_CIDR} ${WORKER3_NODE_IP}
!
router bgp ${LOCAL_ROUTER_ASN}
bgp router-id ${LOCAL_ROUTER_ID}
bgp log-neighbor-changes
bgp bestpath as-path multipath-relax
neighbor ${UPSTREAM_ROUTER_ID} remote-as ${UPSTREAM_ROUTER_ASN}
neighbor K8S peer-group
neighbor K8S remote-as ${K8S_ROUTER_ASN}
neighbor K8S capability extended-nexthop
neighbor K8S update-source ${LOCAL_ROUTER_ID}
neighbor ${WORKER_NODE_CIDR} peer-group K8S
bgp listen range ${WORKER_NODE_CIDR} peer-group K8S
!
address-family ipv4 unicast
redistribute connected
neighbor K8S route-map IMPORT in
neighbor K8S route-map EXPORT out
network ${K8S_LB_CIDR}
network ${K8S_CLUSTERIP_CIDR}
network ${WORKER_NODE_CIDR}
neighbor ${UPSTREAM_ROUTER_ID} soft-reconfiguration inbound
neighbor ${UPSTREAM_ROUTER_ID} route-map ALLOW-ALL in
neighbor ${UPSTREAM_ROUTER_ID} route-map ALLOW-ALL out
neighbor K8S route-map ALLOW-ALL in
neighbor K8S route-map ALLOW-ALL out
exit-address-family
!
route-map ALLOW-ALL permit 100
!
line vty
!
EOF

After the basic definition, we simply create static routes to the worker nodes. Then (as I have several BGP peers in my home infrastructure), I declare BGP peers both upstream (my router) and downstream (the kubernetes cluster). Lastly, among the (no) filters, I define what networks I chose to announce further.

Prometheus Operator CRD

As we will deploy Cilium with resources of kind ServiceMonitors that define what Prometheus Operator should scrape, we should create those CRD beforehand.

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm template kube-prometheus prometheus-community/kube-prometheus-stack --include-crds \
| yq 'select(.kind == "CustomResourceDefinition") * {"metadata": {"annotations": {"meta.helm.sh/release-name": "kube-prometheus", "meta.helm.sh/release-namespace": "monitoring"}}}' \
| kubectl create -f -

Gateway API CRD v0.5.1

While it appears not strictly necessary to install before Cilium (or rather, when we deploy a Gateway or HTTPRoute), but it makes no sense to delay what’s necessary.

kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api/v0.5.1/config/crd/standard/gateway.networking.k8s.io_gatewayclasses.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api/v0.5.1/config/crd/standard/gateway.networking.k8s.io_gateways.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api/v0.5.1/config/crd/standard/gateway.networking.k8s.io_httproutes.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api/v0.5.1/config/crd/experimental/gateway.networking.k8s.io_referencegrants.yaml

Installing Cilium v1.13.x

We are deploying Cilium with a multitude of options set.
- Enable the GoBGP based BGP Control Plane.
- Enable Cilium Ingress Controller.
- Enable Cilium Gateway API.
- Define the Cluster CIDR.
- Install Prometheus Operator ServiceMonitor.
- Install Hubble HTTP L7 Dashboard with a defined set of metrics enabled (including policy visualization for Cilium Network Policies).
- Enable “strict” mode (kube-proxy free installation)
- The observant will see bpf.lbExternalClusterIP. While an anti-pattern, it is a life saver for me that Cilium provides this fantastic possibility as it do enable me to segment the Control Plane to live outside of Data Plane.

CLUSTERCIDR=10.0.0.0/16
K8SAPISERVER=10.12.13.14

helm install cilium cilium/cilium --version 1.13.3 \
--namespace kube-system \
--set bgpControlPlane.enabled=true \
--set ingressController.enabled=true \
--set bpf.lbExternalClusterIP=true \
--set gatewayAPI.enabled=true \
--set bpf.masquerade=true \
--set cluster.id=1 \
--set cluster.name=democluster1 \
--set ipam.mode=kubernetes \
--set ipv4NativeRoutingCIDR=${CLUSTERCIDR} \
--set k8sServiceHost=${K8SAPISERVER} \
--set k8sServicePort=6443 \
--set kubeProxyReplacement=strict \
--set tunnel=vxlan \
--set operator.prometheus.enabled=true \
--set operator.prometheus.serviceMonitor.enabled=true \
--set prometheus.enabled=true \
--set prometheus.serviceMonitor.enabled=true \
--set hubble.relay.enabled=true \
--set hubble.ui.enabled=true \
--set hubble.metrics.dashboards.enabled=true \
--set hubble.metrics.dashboards.namespace=monitoring \
--set hubble.metrics.dashboards.annotations.grafana_folder=Hubble \
--set hubble.metrics.enableOpenMetrics=true \
--set hubble.metrics.enabled="{dns,drop,tcp,flow:sourceContext=workload-name|reserved-identity;destinationContext=workload-name|reserved-identity,port-distribution,icmp,kafka:labelsContext=source_namespace\,source_workload\,destination_namespace\,destination_workload\,traffic_direction;sourceContext=workload-name|reserved-identity;destinationContext=workload-name|reserved-identity,policy:sourceContext=app|workload-name|pod|reserved-identity;destinationContext=app|workload-name|pod|dns|reserved-identity;labelsContext=source_namespace\,destination_namespace,httpV2:exemplars=true;labelsContext=source_ip\,source_namespace\,source_workload\,destination_ip\,destination_namespace\,destination_workload\,traffic_direction}", \
--set hubble.enabled=true \
--set hubble.metrics.serviceMonitor.enabled=true

Install ExternalDNS

ExternalDNS should support Gateway API as of now but I initially had no success on getting the --source=gateway-httproute to work properly, it turned out to be due to missing rbac permissions (the namespaces resource was missing but should be in place in v0.13.5).

The configuration is really out of scope here (as it depends on your DNS hosting), but I show a configuration that has been working for me with RFC2136.

helm repo add external-dns https://kubernetes-sigs.github.io/external-dns/

Then define the values that are appropriate for your environment.

PROVIDER=rfc2136
DNSHOST=10.53.0.2 # Your internal DNS resolver
DNSZONE=your.domain.com # The zone that DNS manages
TSIGSECRET=<TSIG Secret at the DNS server, check named.conf>
TSIGALGO=hmac-sha256 # TSIG algorithm chosen at DNS server
TSIGKEY=externaldns # The TSIG name chosen at DNS server
DOMAINFILTER=your.domain.com # Which sub domains the ExternalDNS handles

$ cat <<EOF | helm upgrade --install -n external-dns external-dns \
external-dns/external-dns --create-namespace -f -
---
serviceAccount:
create: true

rbac:
create: true

securityContext:
runAsNonRoot: true
runAsUser: 65534
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: ["ALL"]

sources:
- service
- ingress
- httprouter

registry: txt
txtOwnerId: "k8s"
txtPrefix: "external-dns-"

domainFilters:
- ${DOMAINFILTER}

provider: ${PROVIDER}

deploymentStrategy:
type: Recreate

extraArgs:
- --rfc2136-host=${DNSHOST}
- --rfc2136-port=53
- --rfc2136-zone=${DNSZONE}
- --rfc2136-tsig-secret=${TSIGSECRET}
- --rfc2136-tsig-secret-alg=${TSIGALGO}
- --rfc2136-tsig-keyname=${TSIGKEY}
- --rfc2136-tsig-axfr
EOF

Installation of Cert Manager

Cert Manager is out of scope, as it depends on how your domain is hosted, but to facilitate TLS certificates in the cluster it is handy to deal it with Cert Manager.

Basically the installation is like this (and then a Issuer/ClusterIssuer needs to be set up). The extra arguments are needed for GatewayAPI (as it is still considered “experimental”) and for split DNS setup (it you, like me, have internal and external DNS servers serving the clients).

helm repo add jetstack https://charts.jetstack.io
helm repo update
helm install cert-manager jetstack/cert-manager \
--namespace cert-manager \
--create-namespace \
--set "extraArgs={\
--feature-gates=ExperimentalGatewayAPISupport=true,\
--dns01-recursive-nameservers-only,\
--dns01-recursive-nameservers=8.8.8.8:53\,1.1.1.1:53}" \
--set installCRDs=true \
--set webhook.hostNetwork=true

https://cert-manager.io/docs/installation/helm/#steps

Choose a ACME provider of choice (DNS-01 is to prefer as the cluster won’t need to be externally exposed) https://cert-manager.io/docs/configuration/acme/dns01/#supported-dns01-providers and set up the challenge method.

A ClusterIssuer can look like this:

TSIGNAME= # name of the TSIG key
VALIDTSIGKEY= # a valid TSIG key to the dns server
CLUSTERISSUER=acme-prod # a name to refer your ClusterIssuer
SOADNS= # the IP of the primary DNS
TSIGALGO=HMACSHA512 # Choose something strong here

cat <<EOF |kubectl apply -f
---
apiVersion: v1
data:
${TSIGNAME}: ${VALIDTSIGKEY}
kind: Secret
metadata:
name: ${TSIGNAME}
namespace: cert-manager
type: Opaque
---
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: ${CLUSTERISSUER}
spec:
acme:
email: ${VALIDEMAIL}
preferredChain: ""
privateKeySecretRef:
name: certmanager-keyref
server: https://acme-v02.api.letsencrypt.org/directory
solvers:
- dns01:
rfc2136:
nameserver: ${SOADNS}
tsigAlgorithm: ${TSIGALGO}
tsigKeyName: ${TSIGNAME}
tsigSecretSecretRef:
key: ${TSIGNAME}
name: ${TSIGNAME}
EOF

Deploy Istio Bookinfo sample application

The Bookinfo sample application created by the Istio project is great to test out the abilities of the Gateway API, let’s deploy it:

kubectl apply -f \
https://raw.githubusercontent.com/istio/istio/\
release-1.13/samples/bookinfo/platform/kube/bookinfo.yaml

Configure Cilium Gateway API

In order to deploy the Gateway, we need to define a IP pool for the Service LoadBalancer and declare how to route/announce the IP through the network.

The concept around the BGP Control Plane, with LB Ipam, creates possibilities for having different routes on different nodes. I’m longing for native multi homing in Cilium but this could be a kind of (limited) alternative to route work load on node level. Btw, I’ve seen some happenings on the multi homing in the project so I feel optimistic that good things will happen with multi homing in the future.

Not really necessary in this specific environment (as the control plane is external and not affected), but needed in my demo environment where control plane and data plane live together, the BGP configuration below will only be applied to nodes with label bgp=worker:

Values from FRR declaration above
K8S_ROUTER_ASN= The AS Number defined for the Kubernetes cluster
LOCAL_ROUTER_ASN= AS Number defined for the router
LOCAL_ROUTER_HOSTCIDR= the /32 CIDR of the router

cat <<EOF |. kubectl apply -f -
---
apiVersion: cilium.io/v2alpha1
kind: CiliumLoadBalancerIPPool
metadata:
name: ippool
spec:
cidrs:
- cidr: 10.245.12.0/24
disabled: false
---
apiVersion: cilium.io/v2alpha1
kind: CiliumBGPPeeringPolicy
metadata:
name: rack0
spec:
nodeSelector:
matchLabels:
bgp: worker
virtualRouters:
- exportPodCIDR: true
localASN: ${K8S_ROUTER_ASN}
neighbors:
- peerASN: ${LOCAL_ROUTER_ASN}
peerAddress: ${LOCAL_ROUTER_HOSTCIDR}
serviceSelector:
matchExpressions:
- key: somekey
operator: NotIn
values:
- never-used-value
EOF

Test out TLS terminated HTTPRoutes

To test out the functionality of TLS terminated Gateway HTTPRoute resources this stanza can be used:

HTTPROUTEDOMAIN1=bookinfo.c1demo.ploio.net
HTTPROUTEDOMAIN2=hipstershop.c1dem.ploio.net
CLUSTERISSUER=acme-prod #Defined when choosing a provider for cert-manager


$ cat <<EOF | kubectl apply -f
---
apiVersion: gateway.networking.k8s.io/v1beta1
kind: Gateway
metadata:
name: tls-gateway
annotations:
cert-manager.io/cluster-issuer: ${CLUSTERISSUER}
spec:
gatewayClassName: cilium
listeners:
- name: https-1
protocol: HTTPS
port: 443
hostname: "${HTTPROUTEDOMAIN1}"
tls:
certificateRefs:
- kind: Secret
name: demo-cert
- name: https-2
protocol: HTTPS
port: 443
hostname: "${HTTPROUTEDOMAIN2}"
tls:
certificateRefs:
- kind: Secret
name: demo-cert
---
apiVersion: gateway.networking.k8s.io/v1beta1
kind: HTTPRoute
metadata:
name: https-app-route-1
spec:
parentRefs:
- name: tls-gateway
hostnames:
- "${HTTPROUTEDOMAIN1}"
rules:
- matches:
- path:
type: PathPrefix
value: /details
backendRefs:
- name: details
port: 9080
---
apiVersion: gateway.networking.k8s.io/v1beta1
kind: HTTPRoute
metadata:
name: https-app-route-2
spec:
parentRefs:
- name: tls-gateway
hostnames:
- "${HTTPROUTEDOMAIN2}"
rules:
- matches:
- path:
type: PathPrefix
value: /
backendRefs:
- name: productpage
port: 9080
EOF

With this deployed there are one gateway and two httproutes in place,

$ kubectl get gateway                                               
NAME CLASS ADDRESS READY AGE
tls-gateway cilium 10.245.12.100 True 38m

$ kubectl get httproutes.gateway.networking.k8s.io
NAME HOSTNAMES AGE
https-app-route-1 ["bookinfo.c1demo.ploio.net"] 37m
https-app-route-2 ["hipstershop.c1demo.ploio.net"] 37m

$ kubectl get certificates.cert-manager.io
NAME READY SECRET AGE
demo-cert True demo-cert 29m

The https-app-route-2 (“hipstershop”) will take us straight to / of the “productpage” Service on port 9080:

The https-app-route-2

The https-app-route-2 (“bookinfo”), on the other hand, will only route on the /details (and below) and direct the HTTP traffic to the details Service on port 9080, thus a 404 on request to /:

$ curl -i https://bookinfo.c1demo.ploio.net/
HTTP/1.1 404 Not Found
date: Sat, 03 Jun 2023 14:01:43 GMT
server: envoy
content-length: 0

A request to /details will take us to the right path:

$ curl -i https://bookinfo.c1demo.ploio.net/details
HTTP/1.1 400 Bad Request
content-type: application/json
server: envoy
date: Sat, 03 Jun 2023 14:03:39 GMT
content-length: 45
x-envoy-upstream-service-time: 2

{"error":"please provide numeric product id"}%

Result when a valid path is fetched:

Output from details Service, through a TLS terminated HTTPRoute

Deploy HTTP Gateway with HTTPRoute

In the same fashion, we will deploy a demo application to explore the the functionality of modifying headers and load balancing. We’ll do this with a simple HTTP Gateway HTTPRoute resource (no TLS termination and no hostname directive, accepting traffic directly to IP):

---
apiVersion: gateway.networking.k8s.io/v1beta1
kind: Gateway
metadata:
annotations:
name: my-gateway
namespace: default
spec:
gatewayClassName: cilium
listeners:
- allowedRoutes:
namespaces:
from: Same
name: web-gw
port: 80
protocol: HTTP
---
apiVersion: gateway.networking.k8s.io/v1beta1
kind: HTTPRoute
metadata:
name: http-app-1
namespace: default
spec:
parentRefs:
- group: gateway.networking.k8s.io
kind: Gateway
name: my-gateway
namespace: default
rules:
- filters:
- requestHeaderModifier:
add:
- name: some-header
value: This is fantastic
set:
- name: user-agent
value: Mozilla/5.0 (compatible; Konqueror/3.5; SunOS) KHTML/3.5.0 (like
Gecko)
type: RequestHeaderModifier
matches:
- path:
type: PathPrefix
value: /
- backendRefs:
- group: ""
kind: Service
name: echo-1
port: 8080
weight: 100
- group: ""
kind: Service
name: echo-2
port: 8080
weight: 0
matches:
- path:
type: PathPrefix
value: /
---
apiVersion: v1
kind: Service
metadata:
labels:
app: echo-1
name: echo-1
namespace: default
spec:
ports:
- name: high
port: 8080
protocol: TCP
targetPort: 8080
selector:
app: echo-1
sessionAffinity: None
type: ClusterIP
---
apiVersion: v1
kind: Service
metadata:
labels:
app: echo-2
name: echo-2
namespace: default
spec:
ports:
- name: high
port: 8080
protocol: TCP
targetPort: 8080
selector:
app: echo-2
sessionAffinity: None
type: ClusterIP
---
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
labels:
app: echo-1
name: echo-1
namespace: default
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: echo-1
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app: echo-1
spec:
containers:
- env:
- name: NODE_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: POD_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
image: gcr.io/kubernetes-e2e-test-images/echoserver:2.2
imagePullPolicy: IfNotPresent
name: echo-1
ports:
- containerPort: 8080
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
---
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
labels:
app: echo-2
name: echo-2
namespace: default
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: echo-2
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app: echo-2
spec:
containers:
- env:
- name: NODE_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: POD_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
image: gcr.io/kubernetes-e2e-test-images/echoserver:2.2
imagePullPolicy: IfNotPresent
name: echo-2
ports:
- containerPort: 8080
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30

The result is a Gateway resource and HTTPRoute (without a hostname):

$ kubectl get httproutes.gateway.networking.k8s.io http-app-1 
NAME HOSTNAMES AGE
http-app-1 41h

$ kubectl get gateway my-gateway
NAME CLASS ADDRESS READY AGE
my-gateway cilium 10.245.12.48 True 41h

If we inspect the spec of HTTPRoute deployed, we can see that weight is 100 on echo-1 and 0 on echo-2, which means that all traffic is to be sent to echo-1. At the same time, we will change the request headers:

kubectl get httproutes.gateway.networking.k8s.io http-app-1 -o yaml | yq .spec
parentRefs:
- group: gateway.networking.k8s.io
kind: Gateway
name: my-gateway
namespace: default
rules:
- filters:
- requestHeaderModifier:
add:
- name: some-header
value: This is fantastic
set:
- name: user-agent
value: Mozilla/5.0 (compatible; Konqueror/3.5; SunOS) KHTML/3.5.0 (like Gecko)
type: RequestHeaderModifier
matches:
- path:
type: PathPrefix
value: /
- backendRefs:
- group: ""
kind: Service
name: echo-1
port: 8080
weight: 100
- group: ""
kind: Service
name: echo-2
port: 8080
weight: 0
matches:
- path:
type: PathPrefix
value: /

This is the corresponding output, where the request headers are changed, and the response comes from echo-1:

$ curl 10.245.12.48                                                            


Hostname: echo-1-78b66687b5-wzhbb

Pod Information:
node name: worker2
pod name: echo-1-78b66687b5-wzhbb
pod namespace: default
pod IP: 10.0.1.230

Server values:
server_version=nginx: 1.12.2 - lua: 10010

Request Information:
client_address=10.0.0.108
method=GET
real path=/
query=
request_version=1.1
request_scheme=http
request_uri=http://10.245.12.48:8080/

Request Headers:
accept=*/*
host=10.245.12.48
some-header=This is fantastic
user-agent=Mozilla/5.0 (compatible; Konqueror/3.5; SunOS) KHTML/3.5.0 (like Gecko)
x-forwarded-proto=http
x-request-id=c66720e5-d45b-40c5-943a-6377ffb4454c

Request Body:
-no body in request-

If we scale up the amount requests, the pattern should be visible — everything is sent to echo-1:

$ :> gwapi.out && for i in {1..100}; do curl 10.248.8.169 >> gwapi.out &>/dev/null; done

$ grep -c Hostname gwapi.out
100

$ grep -c "Hostname: echo-1" gwapi.out
100

$ grep -c "Hostname: echo-2" gwapi.out
0

If we change the weights to 50 on each, the load should be spread out rather equally:

$ kubectl patch --type merge httproutes.gateway.networking.k8s.io http-app-1 -p '
{
"spec": {
"rules": [
{
"backendRefs": [
{
"name": "echo-1",
"port": 8080,
"weight": 50
},
{
"name": "echo-2",
"port": 8080,
"weight": 50
}
]
}
]
}
}'
httproute.gateway.networking.k8s.io/http-app-1 patched

$ :> gwapi.out && for i in {1..100}; do curl 10.248.8.169 >> gwapi.out &>/dev/null; done

$ grep -c Hostname gwapi.out
100

$ grep -c "Hostname: echo-1" gwapi.out
52

$ grep -c "Hostname: echo-2" gwapi.out
48

And finally, patching the echo-2 to have all the load is reflected when we curl the resource:

$ kubectl patch --type merge httproutes.gateway.networking.k8s.io http-app-1 -p '
{
"spec": {
"rules": [
{
"backendRefs": [
{
"name": "echo-1",
"port": 8080,
"weight": 0
},
{
"name": "echo-2",
"port": 8080,
"weight": 100
}
]
}
]
}
}'
httproute.gateway.networking.k8s.io/http-app-1 patched

$ :> gwapi.out && for i in {1..100}; do curl 10.248.8.169 >> gwapi.out &>/dev/null; done

$ grep -c "Hostname: echo-1" gwapi.out
0

$ grep -c "Hostname: echo-2" gwapi.out
100

Install OpenTelemetry Operator and OTel Collector

For this, we will install the operator and collector with values from the Isovalent Cilium Grafana Observability Demo repo.

First we install the OpenTelemetry Operator:

$ cat <<EOF | helm upgrade opentelemetry-operator\
open-telemetry/opentelemetry-operator\
--install --namespace opentelemetry-operator\
--create-namespace --version 0.15.0 -f -
---
admissionWebhooks:
create: false

manager:
serviceMonitor:
enabled: true
env:
ENABLE_WEBHOOKS: "false"
EOF

Install the OpenTelemetry Collector

cat <<EOF | kubectl apply -n opentelemetry-operator -f -
---
apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
name: otel
spec:
mode: daemonset
hostNetwork: true
#image: otel/opentelemetry-collector-contrib:0.60.0
image: ghcr.io/open-telemetry/opentelemetry-collector-releases/opentelemetry-collector-contrib:0.60.0
config: |
receivers:
jaeger:
protocols:
grpc:
endpoint: 0.0.0.0:14250
thrift_http:
endpoint: 0.0.0.0:14268
thrift_compact:
endpoint: 0.0.0.0:6831
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318

processors:
batch: {}
memory_limiter:
check_interval: 5s
limit_mib: 409
spike_limit_mib: 128

exporters:
logging:
loglevel: info
otlp:
endpoint: tempo.tempo.svc.cluster.local:4317
tls:
insecure: true

service:
telemetry:
logs:
level: info
encoding: console
pipelines:
traces:
receivers:
- otlp
- jaeger
processors:
- memory_limiter
- batch
exporters:
- logging
- otlp
EOF

With this in place we chould be able to collect traces.

Deploy Grafana Tempo

We will use Grafana Tempo as our front end for OpenTelemetry for having a look at the Hubble HTTP L7 traces.

$ cat << EOF > tempo-values.yaml
---
fullnameOverride: tempo

tempo:
searchEnabled: true
EOF

$ helm upgrade tempo grafana/tempo --install \
--namespace tempo --create-namespace \
--create-namespace --version 0.16.2 -f tempo-values.yaml

Deploy Kube Prometheus Stack

Prometheus Operator will be installed with the kube-prometheus-stack helm template with some configuration for exemplars handling, dashboards and datasources:

GRAFANAFQDN=grafana.c1demo.ploio.net
GRAFANAPW=password # set something sensible here

cat <<EOF > prometheus-values.yaml
---
# nameOverride: prometheus-k8s
fullnameOverride: prometheus-k8s

prometheus:
prometheusSpec:
serviceMonitorSelectorNilUsesHelmValues: false
podMonitorSelectorNilUsesHelmValues: false
probeSelectorNilUsesHelmValues: false
ruleSelectorNilUsesHelmValues: false

enableRemoteWriteReceiver: true
enableFeatures:
- exemplar-storage
externalLabels:
cluster: kind
ingress:
enabled: false
ingressClassName: cilium

defaultRules:
rules:
kubeProxy: false

alertmanager:
ingress:
enabled: false
ingressClassName: cilium


kubeApiServer:
tlsConfig:
serverName: kubernetes
insecureSkipVerify: true
grafana:
enabled: true
image:
tag: 9.2.0

serviceMonitor:
enabled: true
grafana.ini:
server:
domain: ${GRAFANAFQDN}
root_url: "%(protocol)s://%(domain)s"
feature_toggles:
enable: 'tempoApmTable tempoBackendSearch'
ingress:
enabled: true
ingressClassName: cilium
hosts:
- ${GRAFANAFQDN}
dashboardProviders:
dashboardproviders.yaml:
apiVersion: 1
providers:
- name: 'cilium'
orgId: 1
folder: 'cilium'
type: file
disableDeletion: false
editable: true
options:
path: /var/lib/grafana/dashboards/cilium

dashboards:
cilium:
hubble:
gnetId: 16613
revision: 1
datasource: Prometheus
cilium-agent:
gnetId: 16611
revision: 1
datasource: Prometheus
cilium-operator:
gnetId: 16612
revision: 1
datasource: Prometheus
cilium-policies:
gnetId: 18015
revision: 4
datasource:
- name: DS_PROMETHEUS
value: prometheus

persistence:
enabled: false

adminUser: admin
adminPassword: ${GRAFANAPW}


grafana:
sidecar:
skipTlsVerify: true
dashboards:
folderAnnotation: grafana_folder
provider:
foldersFromFilesStructure: true

datasources:
exemplarTraceIdDestinations:
datasourceUid: tempo
traceIdLabelName: traceID

additionalDataSources:
- name: Tempo
type: tempo
uid: tempo
url: http://tempo.tempo:3100
access: proxy
jsonData:
httpMethod: GET
tracesToMetrics:
datasourceUid: 'prometheus'
tags: [{ key: 'service.name', value: 'service' }, { key: 'job' }]
queries:
- name: 'Sample query'
query: 'sum(rate(tempo_spanmetrics_latency_bucket{$__tags}[5m]))'
serviceMap:
datasourceUid: 'prometheus'
search:
hide: false
nodeGraph:
enabled: true
EOF

$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
$ helm repo update
$ helm upgrade kube-prometheus prometheus-community/kube-prometheus-stack --install --namespace monitoring --create-namespace --version 46.5.0 --values prometheus-values.yaml

Cilium Policy Verdicts Dashboard

Also, to visualise Cilium Network Policies, we installed the Policy Verdicts dashboard. This dashboard can really help us to tune the network policies to a zero trust level.

In my demo I deployed it manually as I had trouble to get it working from Grafana dashboard marketplace (for some reason it was unavailable at the moment), but normally as above in the helm values it would be installed at the same time as the others.

Deploy of OpenEBS for Persistent Storage

Some of the demo components really insisted for a PVC and I checked out OpenEBS and it turned out to be rather sleek (lets see in time if its a good as well):

$ kubectl apply -f https://openebs.github.io/charts/openebs-operator.yaml

$ kubectl patch storageclass openebs-hostpath -p '
{"metadata":
{"annotations":
{"storageclass.kubernetes.io/is-default-class":"true"}
}
}
'

The star of this show — “tenants app”

Either git clone the original demo app directly from Isovalent GH to play with some of the values, or clone my fork for some immediate action:

$ helm repo add minio https://operator.min.io
$ helm repo add strimzi https://strimzi.io/charts
$ helm repo add elastic https://helm.elastic.co
$ git clone https://github.com/tnorlin/cilium-grafana-observability-demo.git
$ cilium-grafana-observability-demo
$ helm dep build ./helm/jobs-app
$ helm upgrade jobs-app ./helm/jobs-app \
--install \
--wait \
--create-namespace \
--namespace tenant-jobs \
-f helm/jobs-app-values.yaml

The components should have stabilised after a couple of minutes and in Hubble a view similar to this should be visible:

The sample “tenants-app” deployed, as shown in Hubble UI.

To show Hubble, we can deploy a Cilium Ingress:

CLUSTERISSUER=acme-prod #Defined when choosing a provider for cert-manager
HUBBLEFQDN=

cat <<EOF | kubectl apply -f -
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
cert-manager.io/cluster-issuer: $(CLUSTERISSUER}
external-dns.alpha.kubernetes.io/hostname: ${HUBBLEFQDN}
name: hubble-ingress
namespace: kube-system
spec:
ingressClassName: cilium
rules:
- host: ${HUBBLEFQDN}
http:
paths:
- backend:
service:
name: hubble-ui
port:
number: 80
path: /
pathType: Prefix
tls:
- hosts:
- ${HUBBLEFQDN}
secretName: hubbleui-tls-cert
EOF

Grafana

In Grafana we should have a bunch of dashboards, but one dashboard is particular of interest, the Hubble L7 HTTP dashboard, which show HTTP metrics along with examplars (the green “squares” in the bottom graph) from Grafana Tempo:

Grafana Hubble L7 HTTP Metrics dashboard.

Hoovering an examplar would show something like this:

Grafana dashboard with Promethus (Hubble) and Tempo as a source.

Example of a trace that got wrong:

Grafana Tempo with a broken request.

Hubble Policy Verdicts

The Cilium Policy Verdicts dashboard, showing output from Hubble policy verdicts. This dashboard can be very helpful in the process of getting the network policies in a good shape by catching the traffic that doesn’t hit a defined rule:

Cilium Policy Verdicts dashbord, green.

The policy verdicts can also be showed with the Hubble cli tool:

hubble observe --type policy-verdict -n tenant-jobs --last 5              
Jun 3 17:34:08.849: tenant-jobs/strimzi-cluster-operator-6d4865c4d6-pnxwt:54352 (ID:109299) -> tenant-jobs/jobs-app-kafka-0:9091 (ID:126417) policy-verdict:L3-L4 INGRESS ALLOWED (TCP Flags: SYN)
Jun 3 17:34:08.910: tenant-jobs/strimzi-cluster-operator-6d4865c4d6-pnxwt:54354 (ID:109299) -> tenant-jobs/jobs-app-kafka-0:9091 (ID:126417) policy-verdict:L3-L4 INGRESS ALLOWED (TCP Flags: SYN)
Jun 3 17:34:09.067: tenant-jobs/strimzi-cluster-operator-6d4865c4d6-pnxwt:54356 (ID:109299) -> tenant-jobs/jobs-app-kafka-0:9091 (ID:126417) policy-verdict:L3-L4 INGRESS ALLOWED (TCP Flags: SYN)
Jun 3 17:34:09.100: tenant-jobs/strimzi-cluster-operator-6d4865c4d6-pnxwt:54358 (ID:109299) -> tenant-jobs/jobs-app-kafka-0:9091 (ID:126417) policy-verdict:L3-L4 INGRESS ALLOWED (TCP Flags: SYN)
Jun 3 17:35:10.079: tenant-jobs/jobs-app-entity-operator-6c69b669b6-gz7l8:56702 (ID:69972) -> tenant-jobs/jobs-app-kafka-0:9091 (ID:126417) policy-verdict:L3-L4 INGRESS ALLOWED (TCP Flags: SYN)
Jun 3 17:35:44.034: tenant-jobs/strimzi-cluster-operator-6d4865c4d6-pnxwt:59206 (ID:109299) -> 10.20.14.20:6443 (ID:16777217) policy-verdict:all EGRESS ALLOWED (TCP Flags: SYN)
Jun 3 17:35:49.889: 10.0.1.248:59578 (host) -> tenant-jobs/jobs-app-entity-operator-6c69b669b6-gz7l8:8080 (ID:69972) policy-verdict:L4-Only INGRESS ALLOWED (TCP Flags: SYN)
Jun 3 17:35:49.889: 10.0.1.248:55716 (host) -> tenant-jobs/jobs-app-entity-operator-6c69b669b6-gz7l8:8081 (ID:69972) policy-verdict:L4-Only INGRESS ALLOWED (TCP Flags: SYN)
Jun 3 17:35:49.889: 10.0.248:55714 (host) -> tenant-jobs/jobs-app-entity-operator-6c69b669b6-gz7l8:8081 (ID:69972) policy-verdict:L4-Only INGRESS ALLOWED (TCP Flags: SYN)
Jun 3 17:35:49.889: 10.0.1.248:59572 (host) -> tenant-jobs/jobs-app-entity-operator-6c69b669b6-gz7l8:8080 (ID:69972) policy-verdict:L4-Only INGRESS ALLOWED (TCP Flags: SYN)

And that was about what I had time to demo during my show time (we have limited amount of time ~35 minutes for presentation, demo and Q&A).

Cilium Mesh (almost there)

Wait, there’s more to it, I was out of time during the demo, but my preparations went a bit further. I had installed another (ordinary) kubernetes Cluster with Cluster Mesh enabled and also connected an external workload (a VM) to the cluster.

Output from the vm running docker:

root@c1demovm1:~# cilium status
KVStore: Ok etcd: 1/1 connected, lease-ID=7c02888260c64b1d, lock lease-ID=7c02888260c64b1f, has-quorum=true: https://clustermesh-apiserver.cilium.io:2379 - 3.5.4 (Leader)
Kubernetes: Disabled
Host firewall: Disabled
CNI Chaining: none
CNI Config file: CNI configuration file management disabled
Cilium: Ok 1.13.3 (v1.13.3-36cb0eed)
NodeMonitor: Listening for events on 4 CPUs with 64x4096 of shared memory
Cilium health daemon: Ok
IPAM: IPv4: 1/2 allocated from 10.190.1.0/30, IPv6: 1/4294967294 allocated from f00d::a14:0:0:0/96
IPv6 BIG TCP: Disabled
BandwidthManager: Disabled
Host Routing: Legacy
Masquerading: IPTables [IPv4: Enabled, IPv6: Enabled]
Controller Status: 17/17 healthy
Proxy Status: OK, ip 10.190.1.2, 0 redirects active on ports 10000-20000
Global Identity Range: min 256, max 65535
Hubble: Disabled
Encryption: Disabled
Cluster health: Probe disabled

In one of the clusters, the vm is visible

$ kubectl get ciliumnode                                      
NAME CILIUMINTERNALIP INTERNALIP AGE
c1demovm1 10.190.1.2 10.20.21.20 103s
worker1 10.0.0.114 172.22.5.31 2d4h
worker3 10.0.1.248 172.22.5.33 2d4h

The vm can do DNS lookups in the cluster

root@c1demovm1:~# nslookup -norecurse hubble-ui.kube-system.svc.cluster.local
Server: 10.192.0.10
Address: 10.192.0.10#53

Name: hubble-ui.kube-system.svc.cluster.local
Address: 10.195.247.135

Also, the vm can connect to resources in the cluster

curl echo-1.default.svc.cluster.local:8080


Hostname: echo-1-78b66687b5-wzhbb

Pod Information:
node name: worker3
pod name: echo-1-78b66687b5-wzhbb
pod namespace: default
pod IP: 10.0.1.230

Server values:
server_version=nginx: 1.12.2 - lua: 10010

Request Information:
client_address=10.190.1.2
method=GET
real path=/
query=
request_version=1.1
request_scheme=http
request_uri=http://echo-1.default.svc.cluster.local:8080/

Request Headers:
accept=*/*
host=echo-1.default.svc.cluster.local:8080
user-agent=curl/7.81.0

Request Body:
-no body in request-

During KubeConEU, Liz Rice showed us an amazing teaser about the future Cilium Mesh and this little piece (screen shot) captured my interest.

This functionality is not in the mainline Cilium (not at least from what I’ve seen), but hopefully something that will land in the next release of Cilium OSS (v1.14) — cilium endpoint add --name= --labels= --ip=. Without the possibility to declare new endpoints, I’ve yet to find good ways to integrate the vm into the cluster (but the other way around seem to work well).

Watch Liz Rice inspiring session here on YouTube.

Well, that’s about it for this time, in the next part I planned to dive a little deeper into some of the parts as there are more to it, especially when it comes to Policy Verdicts and Cluster Mesh.

What did you think about this article? Did I get anything wrong or bad? Spelling mistakes? Will you try out some of the functionality yourself?

Please react|comment|share if you liked the article or else found it useful. I hope it will inspire you to test out at least some of the features.

I’ve had hopes to create an inspiring bare metal cluster with the Turing PI v2 and a couple of Raspberry Pi CM4, hopefully with some kind of a node auto scaler — but you’ll have to bare with my modest demo environment as we’ll have to wait until Q4 for the CM4 to restock…

--

--

Tony Norlin

Homelab tinkerer. ❤ OpenSource. illumos, Linux, kubernetes, networking and security. Recently eBPF. Connect with me: www.linkedin.com/in/tonynorlin