Build a managed Kubernetes cluster from scratch — part 4

14 min readAug 11, 2022

Implementing a first stage of Service Mesh

Although the figure is missing the Ingress Resource and BGP Router, we’ll soon reach this state.

In its current state, after finishing part 3, the cluster should now be able to handle basic workload and be reachable externally by NodePort type Services and internally by ClusterIP type Services, all together with a hard separation between the Control Plane and the Data Plane. The parts here are not illumos specific as from now on we will mostly touch the Worker Plane components.

Beware! — deep dive into security enforcements and restrictions is out of scope in this text. While a high security cluster is an obvious desire, there are no magic bullet that can adapt to all environments as there is a balance between usability, functionality and security — simply, it adds up a level of complexity where too strict settings could end up in a non performant cluster or even worse — a broken and a non-functional state. Someone might even be tricked into believing that the cluster described here are unbreakable and in a finished state, so I’ll better leave those parts as home work for the reader. Although, here are some recommendations on features to implement in order to reach a hygienic state:

Enable audit logs in the Api Server, at least on Metadata level and ship them out of the instance — without logs there are not much of an investigation.
All pods should be deployed as immutable and stateless — data writes should only be allowed on persistent storage and configurations in mounted ConfigMaps/Secrets.
Pods should be running non-privileged — add the necessary capabilities in securityContext or through an init container.
Avoid running as root if possible — containers on the same host share the same kernel, separation is on namespaces and cgroups. which is a weaker isolation than for instance virtual machines.
Enable Pod Security Standards on all namespaces at baseline level.
Implement RBAC — this is a non compromise.
Enforce with Network Policies — luckily Cilium has full support on various levels with NetworkPolicy, CiliumNetworkPolicy and CiliumClusterwideNetworkPolicy resources and it doesn’t stop with Layer 4, they also have Layer 7 capabilities. Cilium is even nice enough to provide a Network Policy Editor that makes creation of policies easier to understand.

There are of course many more steps to be taken, such as moving content of Secrets to an external Secrets Manager, enable AppArmor, implement an Admission Controller such as OPA, Tetragon as as an enforcement and observability tool. I intend to write up on some of these steps in the future.

Extending functionality

As the core components of the cluster now is in place, it is time to develop the features of the cluster, such as adding LoadBalancer type Services, CertManager to handle LetsEncrypt/ZeroSSL certificates, externalDNS to order resolvable Fully Qualified Domain Names, and LongHorn as Persistent Storage.

Implement BGP routing

The next logical step in our cluster is to implement the LoadBalancer type Service resource as we will re-install Cilium with that feature integrated and one way to do that is by installing a BGP router in the network and announce the new network (AS) to all the resources in the local network.

As the implementation will depend on how the local network (outside of Kubernetes) is designed, the absolute configuration will be out of scope for this text. Adapt as needed, but if there happens to already exist a BGP router in the network — use that one and just point to the worker nodes and what ASN the worker nodes will announce.

Anyway, here is a sample config that I used for this environment (although, a bit adapted as I have a couple of extra firewalls in my home infrastructure and my demo Kubernetes cluster’s dedicated BGP router talks with my central BGP router..) and this configuration suites the FRR package. As this configuration contains no filtering or restrictions, as a minimum you would want to restrict traffic on TCP/179 to only be allowed between the BGP routing daemon and the Worker nodes:

cp /etc/frr/frr.conf{,.-orig}
cat << EOF > /etc/frr/frr.conf
!
frr defaults traditional
hostname frr.medium.site
log syslog 
no ipv6 forwarding
service integrated-vtysh-config
!
router bgp 64520
 no bgp ebgp-requires-policy
 bgp log-neighbor-changes
 bgp router-id 10.200.0.62 neighbor 10.200.0.1 remote-as 64521
 neighbor 10.200.0.1 update-source frr0
 neighbor 10.200.0.1 description infraworker1
 neighbor 10.200.0.2 remote-as 64521
 neighbor 10.200.0.2 update-source frr0
 neighbor 10.200.0.2 description infraworker2
 neighbor 10.200.0.3 update-source frr0
 neighbor 10.200.0.3 remote-as 64521
 neighbor 10.200.0.3 description infraworker3
!
 address-family ipv4 unicast
  network 10.200.0.0/24
  network 10.254.254.0/24

  neighbor 10.200.0.1 activate
  neighbor 10.200.0.2 activate
  neighbor 10.200.0.3 activate
  no neighbor 10.200.0.1 send-community
  no neighbor 10.200.0.2 send-community
  no neighbor 10.200.0.3 send-community
  neighbor 10.200.0.1 next-hop-self
  neighbor 10.200.0.2 next-hop-self
  neighbor 10.200.0.3 next-hop-self
 exit-address-family
 !
!
route-map ALLOW-ALL permit 100
!
line vty
!
end
EOF

At this point there will not be any activity as the BGP handshake will not happen until we enable it in the Kubernetes cluster.

Upgrading the Cluster

We will upgrade the cluster to the latest (currently v1.24.3) version of Kubernetes, but first let’s upgrade Cilium just to verify the functionality out of a known cluster state before changing the cluster components.

Cilium upgrade from 1.11.x to 1.12.x

Previously we installed Cilium just to have a basic functionality of Kubernetes, but now we will extend the functionality with BGP (and Ingress, through the newly announced Service Mesh feature built in).

A quick note on Helm in illumos

I did not describe it in previous steps, but in the event that the reader wants to run helm command inside illumos, the binary must first be built. Preferable it should be done in a zone dedicated for just building packages, so pick your favourite branded zone with a recent Go (v1.18.x) environment and run the following and copy it in the PATH into the client zone where kubectl is already copied:

$ git clone https://github.com/helm/helm.git
$ cd helm
$ gmake all WHAT=cmd/helm

Re-implementing Cilium

As mentioned, we will redesign Cilium and in order to start from a known step — we will re-install Cilium (but there are upgrade instructions for those that want the challenge). Let’s have a look at what is installed first:

$ helm list -n kube-system -o yaml
- app_version: 1.11.5
  chart: cilium-1.11.5
  name: cilium
  namespace: kube-system
  revision: "2"
  status: deployed
  updated: 2022-05-18 18:53:03.090385674 +0000 UTC

Uninstall the old version and verify that all the old pieces are gone:

$ helm uninstall -n kube-system cilium

Output:

release “cilium” uninstalled

$ kubectl  -n kube-system get pod

Output:

NAME READY STATUS RESTARTS AGE
coredns-6cd56d4df4–4slzg 1/1 Running 2 84d
coredns-6cd56d4df4-njhln 1/1 Running 2 84d

Create a BGP configuration that suits the BGP router definition and the current environment:

ROUTERASN=64520
ROUTERADDRESS=10.200.0.62
CLUSTERASN=64522
CLUSTERLBCIDR=10.254.254.0/24$ cat << EOF > /var/tmp/configMap-bgp-config.yaml
apiVersion: v1
metadata:
  name: bgp-config
  namespace: kube-system
data:
  config.yaml: |
    peers:
      - peer-address: $ROUTERADDRESS
        peer-asn: $ROUTERASN
        my-asn: $CLUSTERASN
    address-pools:
      - name: default
        protocol: bgp
        addresses:
          - $CLUSTERLBCIDR
kind: ConfigMap
EOF$ kubectl apply -f /var/tmp/configMap-bgp-config.yaml

Install Cilium version 1.12 with BGP announcements and in the kube-proxy free BPF mode:

KUBE_APISERVER=10.100.0.1
KUBE_APIPORT=6443$ helm install cilium cilium/cilium --version 1.12.0 \
  --namespace kube-system \
  --set bgp.enabled=true \
  --set bgp.announce.loadbalancerIP=true \
  --set bgp.announce.podCIDR=true \
  --set ingressController.enabled=true \
  --set kubeProxyReplacement=strict \
  --set k8sServiceHost=${KUBE_APISERVER} \
  --set k8sServicePort=${KUBE_APIPORT}

In the future I want to try out XDP mode for the LoadBalancer but I don’t currently have enough of physical NICs to spend for the worker nodes, and passthrough of physical interfaces to bhyve guests is out of scope in this text (I’ll have to write about pptadm — the PPT administration utility) as I wanted to focus more on Kubernetes in this part.

Verify that the components are up and running, this could take a couple of minutes:

$ kubectl  -n kube-system get pod
NAME                               READY   STATUS     RESTARTS   AGE
cilium-29xtb                       1/1     Running    0          26s
cilium-djp8l                       1/1     Running    0          26s
cilium-nkzzb                       0/1     Init:3/4   0          3s
cilium-operator-6c688c8774-5rnjg   1/1     Running    0          29s
cilium-operator-6c688c8774-mj5rr   1/1     Running    0          29s
coredns-6cd56d4df4-4slzg           1/1     Running    2          84d
coredns-6cd56d4df4-njhln           1/1     Running    2          84d

Also, verify with Cilium agent that everything seems healthy:

$ kubectl -n kube-system exec -it cilium-29xtb -- cilium status 
Defaulted container "cilium-agent" out of: cilium-agent, mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), clean-cilium-state (init)
KVStore:                 Ok   Disabled
Kubernetes:              Ok   1.24+ (v1.24.0-2+906e9d86543c71) [illumos/amd64]
Kubernetes APIs:         ["cilium/v2::CiliumClusterwideEnvoyConfig", "cilium/v2::CiliumClusterwideNetworkPolicy", "cilium/v2::CiliumEndpoint", "cilium/v2::CiliumEnvoyConfig", "cilium/v2::CiliumNetworkPolicy", "cilium/v2::CiliumNode", "core/v1::Namespace", "core/v1::Node", "core/v1::Pods", "core/v1::Secrets", "core/v1::Service", "discovery/v1::EndpointSlice", "networking.k8s.io/v1::NetworkPolicy"]
KubeProxyReplacement:    Strict   [worker3 10.200.0.3]
Host firewall:           Disabled
CNI Chaining:            none
Cilium:                  Ok   1.12.0 (v1.12.0-9447cd1)
NodeMonitor:             Listening for events on 4 CPUs with 64x4096 of shared memory
Cilium health daemon:    Ok   
IPAM:                    IPv4: 3/254 allocated from 10.0.2.0/24, 
BandwidthManager:        Disabled
Host Routing:            Legacy
Masquerading:            IPTables [IPv4: Enabled, IPv6: Disabled]
Controller Status:       25/25 healthy
Proxy Status:            OK, ip 10.0.2.54, 0 redirects active on ports 10000-20000
Global Identity Range:   min 256, max 65535
Hubble:                  Ok   Current/Max Flows: 4095/4095 (100.00%), Flows/s: 2.25   Metrics: Disabled
Encryption:              Disabled
Cluster health:          3/3 reachable   (2022-08-11T16:56:49Z)

Deploy a sample service

As a simple test that the BGP announcements works, we can do a end to end test by deploying a nginx web server and expose it with a LoadBalancer address:

$ kubectl create deployment sample --image=nginx --port 80
deployment.apps/sample created$ kubectl expose deployment sample --name sample-service --port 80 --type LoadBalancer
service/sample-service exposedkubectl get deployment,svc
NAME                        READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/sample      1/1     1            1           81s

NAME                     TYPE           CLUSTER-IP      EXTERNAL-IP    PORT(S)        AGE
service/kubernetes       ClusterIP      10.96.0.1       <none>         443/TCP        85d
service/sample-service   LoadBalancer   10.96.36.166    10.254.254.2   80:31145/TCP   5s

Verify that the routing works from a browser:

In case of issues, check that the firewall lets through TCP/179 (BGP), packet capturing, netstat etc.. Be aware though that ping won’t work, it is not implemented in metalLB which are the component enabling the BGP here.

If everything worked as planned, delete de deployment to maintain a clean state:

$ kubectl get deploy
NAME        READY   UP-TO-DATE   AVAILABLE   AGE
sample      1/1     1            1           6m
$ kubectl delete deploy sample
deployment.apps "sample" deleted

Now that we have a functional cluster with the latest (as of this writing) version of Cilium, lets begin upgrading Kubernetes.

Upgrade of Kubernetes

The official Kubernetes documentation at https://kubernetes.io/docs/tasks/administer-cluster/cluster-upgrade/#manual-deployments states the following order:

You should manually update the control plane following this sequence:
etcd (all instances)
kube-apiserver (all control plane hosts)
kube-controller-manager
kube-scheduler
cloud controller manager, if you use one

Upgrade of etcd cluster

Either clone/fork my copy of etcd v3.5.4 on my Github source repository or grab the latest binary I’ve uploaded here.

Login to one of the etcd nodes and create a snapshot of the current state by first printing out the health and member state:

$ /opt/local/bin/etcdctl --cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key --cacert /etc/kubernetes/pki/etcd/ca.crt  --endpoints 10.100.0.11:2379,10.100.0.12:2379,10.100.0.13:2379 endpoint health10.100.0.12:2379 is healthy: successfully committed proposal: took = 13.426699ms
10.100.0.11:2379 is healthy: successfully committed proposal: took = 16.184386ms
10.100.0.13:2379 is healthy: successfully committed proposal: took = 16.258694ms$ for i in 10.100.0.1{1..3}; do echo "$i: "; /opt/local/bin/etcdctl --cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key --cacert /etc/kubernetes/pki/etcd/ca.crt  --endpoints "$i":2379 endpoint  status -w fields | egrep 'MemberID|Leader'; done10.100.0.11: 
"MemberID" : 1316552200610363290
"Leader" : 18358807038244497468
10.100.0.12: 
"MemberID" : 18358807038244497468
"Leader" : 18358807038244497468
10.100.0.13: 
"MemberID" : 9412444414135415669
"Leader" : 18358807038244497468

Also, verify that the SMF services are of good health, which means that svcs -xvshould output nothing.

From the etcd upgrade notes:

etcd leader is guaranteed to have the latest application data, thus fetch snapshot from leader:

Make a note of which node where the fields MemberID and Leader is of same value as it is the current leader and make a snapshot from that node:

/opt/local/bin/etcdctl --cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key --cacert /etc/kubernetes/pki/etcd/ca.crt  --endpoints 10.100.0.12:2379 snapshot save backup.db
{"level":"info","ts":"2022-08-11T18:16:20.786Z","caller":"snapshot/v3_snapshot.go:65","msg":"created temporary db file","path":"backup.db.part"}
{"level":"info","ts":"2022-08-11T18:16:20.796Z","logger":"client","caller":"v3/maintenance.go:211","msg":"opened snapshot stream; downloading"}
{"level":"info","ts":"2022-08-11T18:16:20.797Z","caller":"snapshot/v3_snapshot.go:73","msg":"fetching snapshot","endpoint":"10.100.0.12:2379"}
{"level":"info","ts":"2022-08-11T18:16:20.869Z","logger":"client","caller":"v3/maintenance.go:219","msg":"completed snapshot read; closing"}
{"level":"info","ts":"2022-08-11T18:16:20.872Z","caller":"snapshot/v3_snapshot.go:88","msg":"fetched snapshot","endpoint":"10.100.0.12:2379","size":"8.1 MB","took":"now"}
{"level":"info","ts":"2022-08-11T18:16:20.872Z","caller":"snapshot/v3_snapshot.go:97","msg":"saved","path":"backup.db"}
Snapshot saved at backup.db

Replace the binary and restart the service:

$ cp /var/tmp/etcd-3.5.4 /opt/local/bin
$ pfexec svcadm restart etcd
$ svcs -xv
$ svcs svc:/application/etcd:default
STATE          STIME    FMRI
online         18:25:45 svc:/application/etcd:default

Verify that the end points still report healthy:

$ /opt/local/bin/etcdctl --cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key --cacert /etc/kubernetes/pki/etcd/ca.crt  --endpoints 10.100.0.11:2379,10.100.0.12:2379,10.100.0.13:2379 endpoint health
10.100.0.13:2379 is healthy: successfully committed proposal: took = 11.03615ms
10.100.0.11:2379 is healthy: successfully committed proposal: took = 13.157837ms
10.100.0.12:2379 is healthy: successfully committed proposal: took = 14.420243ms

Check the logs for errors.

Also, verify that the binary started is etcd v3.5.4:

grep 'starting etcd' /var/log/etcd.log 
{"level":"info","ts":"2022-08-11T18:29:49.342Z","caller":"etcdserver/server.go:842","msg":"starting etcd server","local-member-id":"fec795bb6e81283c","local-server-version":"3.5.4","cluster-id":"7fc54aaec361e224","cluster-version":"3.5"}

Repeat this procedure on the other two etcd nodes.

Upgrade of Kubernetes

I’ve compiled binaries for v1.24.3 of Kubernetes here or, if preffered, clone/fork my branch of the source code here and compile it.

In the following order, replace the binaries of kube-apiserver, kube-controller-manager and kube-scheduler and restart each of them in order. Check SMF services before and note if there are any failed services (there shouldn’t be any) and restart the serCheck the log files for error messages.

A sample flow, copy each of kube-apiserver, kube-controller-manager, kube-scheduler into /var/tmp/ with -v1.24.3 as a suffix on corresponding zone, then on each zone, make a backup of the old binary replace the binaries and check the logs:

$ tail -f /var/log/kube-apiserver.log &
$ cp /opt/local/bin/kube-apiserver /var/tmp/kube-apiserver-v1.24.0
$ cp /var/tmp/kube-apiserver-v1.24.3 /opt/local/bin/kube-apiserver

IF the logs starts to show that the service is starting up (due to “inotify”?), then just verify in the log that things starts successfully, otherwise, restart the service and verify logs:

$ pfexec svcadm restart application/apiserver
$ svcs -a |grep apiserver
online*        18:49:10 svc:/application/apiserver:default$ svcs -a |grep apiserver
online         18:50:00 svc:/application/apiserver:default
$ cat /var/log/kube-apiserver.log
[..]

Finally, replace the kubectl to the v1.24.3 and proceed with upgrade of the worker nodes. As we installed with the package manager the replacement of the binaries are a simple task to follow through.

First, drain the node:

kubectl drain worker2 --ignore-daemonsets
node/worker2 cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/cilium-djp8l
evicting pod kube-system/coredns-6cd56d4df4-njhln
evicting pod kube-system/cilium-operator-6c688c8774-mj5rr
pod/cilium-operator-6c688c8774-mj5rr evicted
pod/coredns-6cd56d4df4-njhln evicted
node/worker2 drained

Upgrade the packages:

$ sudo apt-get install kubelet=1.24.3-00
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following held packages will be changed:
  kubelet
The following packages will be upgraded:
  kubelet
1 upgraded, 0 newly installed, 0 to remove and 89 not upgraded.
Need to get 19.2 MB of archives.
After this operation, 328 kB disk space will be freed.
Do you want to continue? [Y/n] y
Get:1 https://packages.cloud.google.com/apt kubernetes-xenial/main amd64 kubelet amd64 1.24.3-00 [19.2 MB]
Fetched 19.2 MB in 1s (16.0 MB/s)  
(Reading database ... 94723 files and directories currently installed.)
Preparing to unpack .../kubelet_1.24.3-00_amd64.deb ...
Unpacking kubelet (1.24.3-00) over (1.24.0-00) ...
Setting up kubelet (1.24.3-00) ...
Scanning processes...                                                           es... [=========================================               ]
Scanning linux images...                                                        

Running kernel seems to be up-to-date.

No services need to be restarted.

No containers need to be restarted.

No user sessions are running outdated binaries.

No VM guests are running outdated hypervisor (qemu) binaries on this host.

Uncordon the node and check state

$ kubectl uncordon worker2
node/worker2 uncordoned
$ kubectl get nodes 
NAME      STATUS   ROLES    AGE   VERSION
worker1   Ready    <none>   85d   v1.24.0
worker2   Ready    <none>   85d   v1.24.3
worker3   Ready    <none>   85d   v1.24.0

Repeat the procedure with the other nodes.

Verify that all nodes reports to be running v1.24.3 and restart both the cilium-operator and the cilium agent:

$ kubectl get nodes 
NAME      STATUS   ROLES    AGE   VERSION
worker1   Ready    <none>   85d   v1.24.3
worker2   Ready    <none>   85d   v1.24.3
worker3   Ready    <none>   85d   v1.24.3$ kubectl -n kube-system rollout restart deployment cilium-operator
$ kubectl -n kube-system rollout restart ds cilium

Wait a minute to have the pods back in a healthy state again:

$ kubectl get pod -n kube-system
NAME                               READY   STATUS    RESTARTS   AGE
cilium-9mr6f                       1/1     Running   0          2m38s
cilium-dv5vf                       1/1     Running   0          2m9s
cilium-n6j55                       1/1     Running   0          2m38s
cilium-operator-6cfb9c4654-p5vv9   1/1     Running   0          2m1s
cilium-operator-6cfb9c4654-qsn58   1/1     Running   0          2m1s
coredns-6cd56d4df4-47j82           1/1     Running   0          30m
coredns-6cd56d4df4-gx9zw           1/1     Running   0          27m

Verify inside the cilium agent that the health is restored:

$ kubectl -n kube-system exec -it cilium-9mr6f -- cilium status 
Defaulted container "cilium-agent" out of: cilium-agent, mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), clean-cilium-state (init)
KVStore:                 Ok   Disabled
Kubernetes:              Ok   1.24+ (v1.24.3-illumos) [illumos/amd64]
Kubernetes APIs:         ["cilium/v2::CiliumClusterwideEnvoyConfig", "cilium/v2::CiliumClusterwideNetworkPolicy", "cilium/v2::CiliumEndpoint", "cilium/v2::CiliumEnvoyConfig", "cilium/v2::CiliumNetworkPolicy", "cilium/v2::CiliumNode", "core/v1::Namespace", "core/v1::Node", "core/v1::Pods", "core/v1::Secrets", "core/v1::Service", "discovery/v1::EndpointSlice", "networking.k8s.io/v1::NetworkPolicy"]
KubeProxyReplacement:    Strict   [worker2 10.200.0.2]
Host firewall:           Disabled
CNI Chaining:            none
Cilium:                  Ok   1.12.0 (v1.12.0-9447cd1)
NodeMonitor:             Listening for events on 4 CPUs with 64x4096 of shared memory
Cilium health daemon:    Ok   
IPAM:                    IPv4: 6/254 allocated from 10.0.0.0/24, 
BandwidthManager:        Disabled
Host Routing:            Legacy
Masquerading:            IPTables [IPv4: Enabled, IPv6: Disabled]
Controller Status:       40/40 healthy
Proxy Status:            OK, ip 10.0.0.64, 0 redirects active on ports 10000-20000
Global Identity Range:   min 256, max 65535
Hubble:                  Ok   Current/Max Flows: 554/4095 (13.53%), Flows/s: 3.11   Metrics: Disabled
Encryption:              Disabled
Cluster health:          3/3 reachable   (2022-08-11T19:56:28Z)

Extending the cluster beyond the core components

Install Hubble (again)

With v1.12.0 it is a simple task to get Hubble up and running again, just issue a helm upgrade:

helm upgrade cilium cilium/cilium --version 1.12.0 \
   --namespace kube-system \
   --reuse-values \
   --set hubble.relay.enabled=true \
   --set hubble.ui.enabled=true
Release "cilium" has been upgraded. Happy Helming!
NAME: cilium
LAST DEPLOYED: Thu Aug 11 20:07:04 2022
NAMESPACE: kube-system
STATUS: deployed
REVISION: 2
TEST SUITE: None
NOTES:
You have successfully installed Cilium with Hubble Relay and Hubble UI.

Your release version is 1.12.0.

For any further help, visit https://docs.cilium.io/en/v1.12/gettinghelp

Create an Ingress

Lets test Hubble with the new Ingress resource! First, you’ll need either to point out an A record in the local DNS server to the FQDN of the Ingress object or create a hosts entry in /etc/hosts (remember to reload any local caching daemon) and then create the resource:

$ cat << EOF > /var/tmp/hubble-ui-ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
  labels:
    k8s-app: hubble-ui
  name: hubble-ui
  namespace: kube-system
spec:
  ingressClassName: cilium
  rules:
  - host: hubble.medium.site
    http:
      paths:
      - backend:
          service:
            name: hubble-ui
            port:
              number: 80
        path: /
        pathType: Prefix
EOF
$ kubectl create -f ingress-hubble.yaml 
ingress.networking.k8s.io/hubble-ui created

Verify that the Ingress object has received an IP from the declared BGP CIDR:

$ kubectl  -n kube-system get svc,ingress
NAME                               TYPE           CLUSTER-IP       EXTERNAL-IP    PORT(S)                  AGE
service/cilium-ingress-hubble-ui   LoadBalancer   10.98.147.56     10.254.254.3   80:31338/TCP             2m40s
service/hubble-peer                ClusterIP      10.106.121.144   <none>         443/TCP                  7h25m
service/hubble-relay               ClusterIP      10.98.59.96      <none>         80/TCP                   8m15s
service/hubble-ui                  ClusterIP      10.102.234.124   <none>         80/TCP                   8m15s
service/kube-dns                   ClusterIP      10.96.0.10       <none>         53/UDP,53/TCP,9153/TCP   85d

NAME                                  CLASS    HOSTS                ADDRESS        PORTS   AGE
ingress.networking.k8s.io/hubble-ui   cilium   hubble.medium.site   10.254.254.3   80      4m22s

If everything goes as planned, the browser should now reach the declared address:

In it current stage, it is only possible to reach the Ingress by HTTP, as we have not created a TLS certificate. In the step I intend to describe one method to implement Cert Manager in order to facilitate TLS by Lets Encrypt certificates in a rather effortless way.

But that’s it for this part, I will write about it in the next part, as that involves a couple of more steps, such as installing an external load lalancer and preparations of the helm chart (as we are running the cluster in a non standard way, by splitting up the Data Plane and Worker Plane, helm chart needs to be adapted as such).

I welcome comments if I have misspelled anything or if there are any factual or otherwise obvious errors. Naturally, comments or “claps” regarding if this is of interest is of course highly appreciated.

Until next time..