Build a managed Kubernetes cluster from scratch — part 5

Will this finally be the last part of this series?

15 min readAug 14, 2022

In the previous part we built a cluster that handles Ingress resources, and although it would be possible (with some manual labour) to generate a certificate and declare a valid DNS record that points to the Ingress, it makes sense to have them handled automatically by a functions such as Cert Manager.

A figure of the different components that are in separate network segments.

External Requirements

A DNS server/provider that can handle Lets Encrypt Challenges
An external load balancer
A DNS server that handles internal resolving of cluster resources

Implementing an external Load Balancer

Some deployments (such as Cert Manager) deploys Admission Controllers to leverage Control Plane functionality outside of the API Server by intercepting the API calls with Webhooks (HTTP callbacks).

The API call is sent from the client by a request to the API Server, which in turn need to route the call (back) to the Worker node where the deployed Admission Controller (Webhook) is running. The request is encrypted by TLS and sent to the defined resource, normally to the defined Service within selected Namespace — as we are running the Data Plane outside of the Control Plane, we need to help the API Server to reach the deployment by putting a load balancer in front of the worker nodes and point the deployment to make the call to a DNS name instead.

The choice of load balancer is a matter of taste as we are not relying on any advanced features — if there already exists one in the network, it can be used instead. I chose to have a dedicated (on a separate VLAN, for control) to this cluster as getting HAProxy up and running in a pkgsrc branded zone is very resource friendly:

$ pkgin install haproxy-2.X.YUSERNAME=$(openssl rand -hex 16)
PASSWORD=$(openssl rand -base64 16)$ cat << EOF > /opt/local/etc/haproxy.cfg
defaults
    maxconn 20000
    mode    tcp
    option  dontlognulllisten stats
    bind *:8443
    mode http
    stats enable
    stats uri /stats
    stats refresh 10s
    stats auth $USERNAME:$PASSWORD
    stats admin if TRUEfrontend certmanager-webhook
    bind *:443
    mode tcp
    option tcplog
    default_backend certmanager-wh-backendbackend certmanager-wh-backend
    mode tcp
    option ssl-hello-chk
    server worker1 10.200.0.1:10256 check inter 1s verify none
    server worker2 10.200.0.2:10256 check inter 1s verify none
    server worker3 10.200.0.3:10256 check inter 1s verify nonefrontend otel-webhook
    bind *:444
    mode tcp
    option tcplog
    default_backend otel-wh-backendbackend otel-wh-backend
    mode tcp
    option ssl-hello-chk
    server worker1 10.200.0.1:10258 check inter 1s verify none
    server worker2 10.200.0.2:10258 check inter 1s verify none
    server worker3 10.200.0.3:10258 check inter 1s verify none
EOF$ pfexec svcadm enable -r svc:/pkgsrc/haproxy:default

The health check for the load balancer will fail, due to nothing listening on port 10256 (yet) on the worker nodes — this is expected.

Preparing CoreDNS

The current state of CoreDNS has served us well for internal requests within the cluster, but as we now grow the functionality with external verifiers we also need to be able to resolve remote destinations and we do it by adding one line: forward . /etc/resolv.conf .

This enables CoreDNS to resolve entries upstream and we will apply it to the deployment and reload the configuration (beware to end any line with a blank space (⌴) as it will make the parser confused and remove the formatting):

$ cat <<EOF | kubectl apply -f -
---
apiVersion: v1
data:
  Corefile: |
    .:53 {
        errors
        health {
          lameduck 5s
        }
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
          fallthrough in-addr.arpa ip6.arpa
        }
        prometheus :9153
        forward . /etc/resolv.conf
        cache 30
        loop
        reload
        loadbalance
    }
kind: ConfigMap
metadata:
  name: coredns
  namespace: kube-system
EOF$ kubectl  -n kube-system rollout restart deployment coredns

Install Cert Manager

The method chosen here to install Cert Manager is with helm, and we do it by adding their repository and issue an update command as described by the official instructions (but seem unecessary if its the first time adding, as it will automatically add it updated):

$ helm repo add jetstack https://charts.jetstack.io$ helm repo update

Keep in mind that the URL set up must be known by, at a minimum, the kube-apiserver — but it would probably be wise to have it as a local DNS entry and this should point to the external loadbalancer (HAProxy) that we previously implemented :


VERSION=v1.9.1
EXTLB=extlbvip.medium.site # A FQDN pointing to the HAProxy
PORT=10256 # The same port number as defined above in the LBhelm install  cert-manager jetstack/cert-manager   --namespace cert-manager   --create-namespace   --version ${VERSION}   --set installCRDs=true --set webhook.url.host=${EXTLB} --set webhook.securePort=${PORT} --set webhook.hostNetwork=true >> helm-cert-manager-hostFor now, ignore http: TLS handshake error from <load balancer IP>:57491: tls: client offered only unsupported versions: []

Lets Encrypt Certificates

Lets Encrypt DNS Challenge

There are two ways to complete the certificate request challenge — HTTP01 and DNS01. In my opinion, the DNS01 option is to prefer security-wise, as the HTTP01 needs to expose (and no, there is no “known” IP where the challenge response is initiated from) a HTTP daemon for the challenge to complete. One might say that another WAN link could be exposed for that purpose only, that’s true but to little or no much benefit. I prefer the DNS01 challenge and that is what will be described here.

As for the DNS challenge, the method to actually solve the challenge depends on the provider that manages the domain name’s DNS records or, in case that you are selfserving the domain name, at a DNS provider. This in turn will decide the actual method and it is described here. From my experience the Webhook option is subpar (an emergency solution for those provider that failed to implement a good API) and a solution that will require extra steps to get working with an external Control Plane.

Either way, the domain name you order certificates for needs to be in your possession and pointed to to where you creates the challenge for Lets Encrypt.

I have tested the CloudFlare and RFC2136 methods and they are very similar to each other.

Cloud Flare example (described in detail here) from my earlier design:

EMAILADDRESS=<email@address.here>
APIKEY=(from Profile>API Tokens>API Keys>Global API Key > View)$ kubectl create secret generic cloudflare-api-key-secret --from-literal=api-key=${APIKEY}$ cat <<EOF | kubectl apply -f -
---
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  annotations:
  name: cloudflare-issuer-production
spec:
  acme:
    email: ${EMAILADDRESS}
    preferredChain: ""
    privateKeySecretRef:
      name: cloudflare-issuer-production-account-key
    server: https://acme-v02.api.letsencrypt.org/directory
    solvers:
    - dns01:
        cloudflare:
          apiTokenSecretRef:
            key: api-key
            name: cloudflare-api-key-secret
          email: ${EMAILADDRESS}
EOF

Cloud Flare worked seamless, but I decided that I wanted more control, and as I already host a primary DNS it felt natural to instead choose RFC2136:

EMAILADDRESS=<email@address.here> # To register at Lets Encrypt
NAMESERVER=10.53.0.1 # Enter the IP of primary DNS
TSIGALGO=HMACSHA512 # Enter the one chosen
TSIGNAME=certmanager # Enter the one chosen
TSIGKEY=<enter the secret key generated> # Hint, look in named.conf kubectl create secret generic tsig-certmanager --from-literal=${TSIGNAME}=${TSIGKEY}$ cat <<EOF | kubectl apply -f -
---
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  generation: 1
  name: prod-certificate-clusterissuer
spec:
  acme:
    email: ${EMAILADDRESS}
    preferredChain: ""
    privateKeySecretRef:
      name: prod-certificate-clusterissuer-keyref
    server: https://acme-v02.api.letsencrypt.org/directory
    solvers:
    - dns01:
        rfc2136:
          nameserver: ${NAMESERVER}
          tsigAlgorithm: ${TSIGALGO}
          tsigKeyName: ${TSIGNAME}
          tsigSecretSecretRef:
            key: ${TSIGNAME}
            name: tsig-certmanager
EOF

Output:

clusterissuer.cert-manager.io/prod-certificate-clusterissuer created

Verify that the ClusterIssuer was registered OK at the Issuer (the Status output):

$ kubectl describe clusterissuer
Name:         prod-certificate-clusterissuer
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  cert-manager.io/v1
Kind:         ClusterIssuer
Metadata:
  Creation Timestamp:  2022-08-12T22:18:22Z
  Generation:          1
[...]
    Manager:         cert-manager-clusterissuers
    Operation:       Update
    Subresource:     status
    Time:            2022-08-12T22:18:23Z
  Resource Version:  1366968
  UID:               18209f74-cbd5-4789-bba5-3e0a5d4d2c30
Spec:
  Acme:
    Email:            [.. email address here ..]
    Preferred Chain:  
    Private Key Secret Ref:
      Name:  prod-certificate-clusterissuer-keyref
    Server:  https://acme-v02.api.letsencrypt.org/directory
    Solvers:
      dns01:
        rfc2136:
          Nameserver:  10.53.0.1  
          Tsig Algorithm:  HMACSHA512
          Tsig Key Name:   certmanager
          Tsig Secret Secret Ref:
            Key:   certmanager
            Name:  tsig-certmanager
Status:
  Acme:
    Last Registered Email:  [.. email address here ..]
    Uri:                    https://acme-v02.api.letsencrypt.org/acme/acct/674725597
  Conditions:
    Last Transition Time:  2022-08-12T22:18:23Z
    Message:               The ACME account was registered with the ACME server
    Observed Generation:   1
    Reason:                ACMEAccountRegistered
    Status:                True
    Type:                  Ready
Events:                    <none>

Now, if everything goes as intended, the environment is ready to create a valid certificate (make sure that the kube-apiserver can reach the DNS server to create the challenge TXT record), a sample certificate:

DNSNAME=medium-site.ploio.net # The FQDN to order the certificate for
SECRET=medium-site-tls-cert # Name of Secret to put certificate in
$ cat <<EOF | kubectl apply -f -
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: ${DNSNAME}
spec:
  commonName: ${DNSNAME}
  dnsNames:
  - ${DNSNAME}
  issuerRef:
    name: prod-certificate-clusterissuer
    kind: ClusterIssuer
  secretName: ${SECRET}
EOF

Output:

certificate.cert-manager.io/medium-site.ploio.net created

Various methods for inspect the progress:

$ kubectl get certificaterequest
NAME                               APPROVED   DENIED   READY   ISSUER                           REQUESTOR                                         AGE
medium-site.ploio.net-rmlgh   True                False   prod-certificate-clusterissuer   system:serviceaccount:cert-manager:cert-manager   4s$ kubectl describe certificaterequest medium-site.ploio.net-rmlgh
Name:         medium-site.ploio.net-rmlgh
Namespace:    default
Labels:       <none>
Annotations:  cert-manager.io/certificate-name: medium-site.ploio.net
              cert-manager.io/certificate-revision: 1
              cert-manager.io/private-key-secret-name: medium-site.ploio.net-9npg2
API Version:  cert-manager.io/v1
Kind:         CertificateRequest
Metadata:
  Creation Timestamp:  2022-08-13T09:31:38Z
  Generate Name:       medium-site.ploio.net-
  Generation:          1
[ ... ]
Status:
  Conditions:
    Last Transition Time:  2022-08-13T09:31:38Z
    Message:               Certificate request has been approved by cert-manager.io
    Reason:                cert-manager.io
    Status:                True
    Type:                  Approved
    Last Transition Time:  2022-08-13T09:31:38Z
    Message:               Waiting on certificate issuance from order default/medium-site.ploio.net-rmlgh-3621970007: "pending"
    Reason:                Pending
    Status:                False
    Type:                  Ready
Events:
  Type    Reason           Age   From                                          Message
  ----    ------           ----  ----                                          -------
  Normal  cert-manager.io  11s   cert-manager-certificaterequests-approver     Certificate request has been approved by cert-manager.io
  Normal  OrderCreated     11s   cert-manager-certificaterequests-issuer-acme  Created Order resource default/medium-site.ploio.net-rmlgh-3621970007
  Normal  OrderPending     11s   cert-manager-certificaterequests-issuer-acme  Waiting on certificate issuance from order default/medium-site.ploio.net-rmlgh-3621970007: ""$ kubectl get challenge
NAME                                                     STATE     DOMAIN                       AGE
medium-site.ploio.net-rmlgh-3621970007-1348735329   pending   medium-site.ploio.net   24s
$ kubectl describe challenge medium-site.ploio.net-rmlgh-3621970007-1348735329
Name:         medium-site.ploio.net-rmlgh-3621970007-1348735329
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  acme.cert-manager.io/v1
Kind:         Challenge
Metadata:
  Creation Timestamp:  2022-08-13T09:31:39Z
  Finalizers:
    finalizer.acme.cert-manager.io
  Generation:  1
[...]
Status:
  Presented:   true
  Processing:  true
  Reason:      Waiting for DNS-01 challenge propagation: DNS record for "medium-site.ploio.net" not yet propagated
  State:       pending
Events:
  Type    Reason     Age   From                     Message
  ----    ------     ----  ----                     -------
  Normal  Started    32s   cert-manager-challenges  Challenge scheduled for processing
  Normal  Presented  31s   cert-manager-challenges  Presented challenge using DNS-01 challenge mechanism
$ kubectl describe certificate medium-site.ploio.net
Name:         medium-site.ploio.net
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  cert-manager.io/v1
Kind:         Certificate
Metadata:
  Creation Timestamp:  2022-08-13T09:31:37Z
  Generation:          1
[...]
Status:
  Conditions:
    Last Transition Time:  2022-08-13T09:33:49Z
    Message:               Certificate is up to date and has not expired
    Observed Generation:   1
    Reason:                Ready
    Status:                True
    Type:                  Ready
  Not After:               2022-11-11T08:33:47Z
  Not Before:              2022-08-13T08:33:48Z
  Renewal Time:            2022-10-12T08:33:47Z
  Revision:                1
Events:
  Type    Reason     Age   From                                       Message
  ----    ------     ----  ----                                       -------
  Normal  Issuing    17m   cert-manager-certificates-trigger          Issuing certificate as Secret does not exist
  Normal  Generated  17m   cert-manager-certificates-key-manager      Stored new private key in temporary Secret resource "medium-site.ploio.net-9npg2"
  Normal  Requested  17m   cert-manager-certificates-request-manager  Created new CertificateRequest resource "medium-site.ploio.net-rmlgh"
  Normal  Issuing    15m   cert-manager-certificates-issuing          The certificate has been successfully issued

During the challenge it should also be possible to view the challenge in the public DNS responsible for the domain:

_acme-challenge.medium-site.ploio.net    TXT    60   -   "[.. challenge here ..]"

As the certificate issued correctly and served its purpose, it can now be removed.

Handle internal DNS records — meet ExternalDNS

In order to let Kubernetes create DNS records to automatically be resolvable by all clients, there is a deployment called ExternalDNS that fixes this part, so lets install that one as well with Helm:

helm repo add external-dns https://kubernetes-sigs.github.io/external-dns/

After you've installed the repo you can install the chart.

PROVIDER=rfc2136
DNSHOST=10.53.0.2 # Your internal DNS resolver
DNSZONE=ploio.net # The zone that DNS manages
TSIGSECRET=<TSIG Secret at the DNS server, check named.conf>
TSIGALGO=hmac-sha256 # TSIG algorithm chosen at DNS server
TSIGKEY=externaldns # The TSIG name chosen at DNS server
DOMAINFILTER=ploio.net # Which sub domains the ExternalDNS handles$ cat <<EOF | helm upgrade --install -n external-dns external-dns \
external-dns/external-dns --create-namespace  -f -
---
serviceAccount:
  create: true

rbac:
  create: true

securityContext:
  runAsNonRoot: true
  runAsUser: 65534
  allowPrivilegeEscalation: false
  readOnlyRootFilesystem: true
  capabilities:
    drop: ["ALL"]

sources:
  - service
  - ingress

registry: txt
txtOwnerId: "k8s"
txtPrefix: "external-dns-"

domainFilters: 
  - ${DOMAINFILTER}

provider: ${PROVIDER}

deploymentStrategy:
  type: Recreate

extraArgs: 
  - --rfc2136-host=${DNSHOST}
  - --rfc2136-port=53
  - --rfc2136-zone=${DNSZONE}
  - --rfc2136-tsig-secret=${TSIGSECRET}
  - --rfc2136-tsig-secret-alg=${TSIGALGO}
  - --rfc2136-tsig-keyname=${TSIGKEY}
  - --rfc2136-tsig-axfr
EOF

Check in the external-dns namespace so that the pod is running, also check the logs for eventuall issues with communication against the internal DNS.

Verification of Cert Manager and ExternalDNS

With the pieces in place, we can patch the Hubble Ingress resource and test everything out. First, patch the Ingress to enable TLS and the desired DNS entry:

INGRESSURL=hubble-medium-site.ploio.net
$ cat <<EOF |kubectl patch -n kube-system ingress hubble-ui \
 --patch-file=/dev/stdin
{
  "spec": {
    "tls": [
            {
                "hosts": [
                    "${INGRESSURL}"
                ],
                "secretName": "hubble-ingress-tls"
            }
        ]

  }
}
EOF

Output:

ingress.networking.k8s.io/hubble-ui patched

Then, annotate (or edit if that is to prefer) the Ingress with the ClusterIssuer that we created:

$ kubectl -n kube-system annotate ingress hubble-ui \
cert-manager.io/cluster-issuer=prod-certificate-clusterissuer

Output:

ingress.networking.k8s.io/hubble-ui annotated

Similar to earlier described, this triggers the CertificateRequest, the Challenge and so on.. finally a Certificate should have been issued.

Now, to have the Load Balancer IP of the Ingress resource automatically created in the internal DNS, make another annotation like this:

$ kubectl -n kube-system annotate ingress \
hubble-ui external-dns.alpha.kubernetes.io/hostname=hubble-medium-site.ploio.net

Output:

ingress.networking.k8s.io/hubble-ui annotated

The logs at the external-dns pod should show something similar to this:

[..]
time="2022-08-13T15:18:02Z" level=info msg="Adding RR: hubble-medium-site.ploio.net 0 A 10.254.254.3"
time="2022-08-13T15:18:02Z" level=info msg="Adding RR: external-dns-hubble-medium-site.ploio.net 0 TXT \"heritage=external-dns,external-dns/owner=k8s,external-dns/resource=ingress/kube-system/hubble-ui\""
time="2022-08-13T15:18:02Z" level=info msg="Adding RR: external-dns-a-hubble.medium-site.ploio.net 0 TXT \"heritage=external-dns,external-dns/owner=k8s,external-dns/resource=ingress/kube-system/hubble-ui\""
time="2022-08-13T15:19:02Z" level=info msg="All records are already up to date"
time="2022-08-13T15:20:02Z" level=info msg="All records are already up to date"

Hubble UI, viewed through the integrated Ingress controller (envoy).

Deploy Hubble adaptor for OpenTelemetry

With the Cilium v1.12 release there was also announced Tracing with OpenTelemetry. According to the instructions found here we should kubectl against a kustomize base definition — this requires that the git command is in the PATH:

$ kubectl apply -k github.com/cilium/kustomize-bases/jaeger

Output:

namespace/jaeger created
customresourcedefinition.apiextensions.k8s.io/jaegers.jaegertracing.io created
serviceaccount/jaeger-operator created
role.rbac.authorization.k8s.io/jaeger-operator created
clusterrole.rbac.authorization.k8s.io/jaeger-operator created
rolebinding.rbac.authorization.k8s.io/jaeger-operator created
clusterrolebinding.rbac.authorization.k8s.io/jaeger-operator created
deployment.apps/jaeger-operator created

$ cat <<EOF | kubectl apply -f -
---
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
  name: jaeger-default
  namespace: jaeger
spec:
  strategy: allInOne
  storage:
    type: memory
    options:
      memory:
        max-traces: 100000
  ingress:
    enabled: false
  annotations:
    scheduler.alpha.kubernetes.io/critical-pod: ""
EOF

Output:

jaeger.jaegertracing.io/jaeger-default created

kubectl apply -k github.com/cilium/kustomize-bases/opentelemetry

Output:

namespace/opentelemetry-operator-system created
customresourcedefinition.apiextensions.k8s.io/opentelemetrycollectors.opentelemetry.io created
serviceaccount/opentelemetry-operator-controller-manager created
role.rbac.authorization.k8s.io/opentelemetry-operator-leader-election-role created
clusterrole.rbac.authorization.k8s.io/opentelemetry-operator-manager-role created
clusterrole.rbac.authorization.k8s.io/opentelemetry-operator-metrics-reader created
clusterrole.rbac.authorization.k8s.io/opentelemetry-operator-proxy-role created
rolebinding.rbac.authorization.k8s.io/opentelemetry-operator-leader-election-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/opentelemetry-operator-manager-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/opentelemetry-operator-proxy-rolebinding created
service/opentelemetry-operator-controller-manager-metrics-service created
service/opentelemetry-operator-webhook-service created
deployment.apps/opentelemetry-operator-controller-manager created
certificate.cert-manager.io/opentelemetry-operator-serving-cert created
issuer.cert-manager.io/opentelemetry-operator-selfsigned-issuer created
mutatingwebhookconfiguration.admissionregistration.k8s.io/opentelemetry-operator-mutating-webhook-configuration created
validatingwebhookconfiguration.admissionregistration.k8s.io/opentelemetry-operator-validating-webhook-configuration created

As can be seen here, two Admission Controllers was created and we need to change both CRDs from service service to url schemes, from:

[..]
service:
      name: opentelemetry-operator-webhook-service
      namespace: opentelemetry-operator-system
      path: /validate-opentelemetry-io-v1alpha1-opentelemetrycollector
      port: 443
[..]

to the following stanza, where EXTLB is the same url as previously declared to the other webhook (external load balancer/HAProxy):


[..]
    url: https://${EXTLB}:444/validate-opentelemetry-io-v1alpha1-opentelemetrycollector
[..]

Do this on both thevalidatingwebhookconfiguration and the mutatingwebhookconfiguration- it should be on two occurances on each webhook configuration. The reason for this adaption is the same as earlier, as our kube-apiserver is external from a Data Plane point of view it is necessary to publish an address that is known to the Control Plane.

Edit the certificate

$ kubectl -n opentelemetry-operator-system edit certificate opentelemetry-operator-serving-cert

In dnsNames, add the url to the HAProxy to have the certificate regenerated

EXTLB=extlbvip.medium.site # A FQDN pointing to the HAProxy
$ kubectl -n opentelemetry-operator-system patch certificate opentelemetry-operator-serving-cert --type='json' -p='[{"op": "add", "path": "/spec/dnsNames/-","value":"${EXTLB}"}]}]'

Patch the deployment to listen on the hostport 10258 (as defined in HAProxy), then restart the deployment to read in the new configuration:

$ kubectl -n opentelemetry-operator-system patch deployments opentelemetry-operator-controller-manager --type='json' -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/ports/0", "value": {"containerPort":9443,"hostPort":10258,"name":"webhook-server","protocol":"TCP"}}]'

Output:

deployment.apps/opentelemetry-operator-controller-manager patched

$ cat <<EOF | kubectl apply -f -
---
apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: otelcol-hubble
  namespace: kube-system
spec:
  mode: daemonset
  image: ghcr.io/cilium/hubble-otel/otelcol:v0.1.1
  env:
    # set NODE_IP environment variable using downwards API
    - name: NODE_IP
      valueFrom:
        fieldRef:
          fieldPath: status.hostIP
  volumes:
    # this example connect to Hubble socket of Cilium agent
    # using host port and TLS
    - name: hubble-tls
      projected:
        defaultMode: 256
        sources:
          - secret:
              name: hubble-relay-client-certs
              items:
                - key: tls.crt
                  path: client.crt
                - key: tls.key
                  path: client.key
                - key: ca.crt
                  path: ca.crt
    # it's possible to use the UNIX socket also, for which
    # the following volume will be needed
    # - name: cilium-run
    #   hostPath:
    #     path: /var/run/cilium
    #     type: Directory
  volumeMounts:
    # - name: cilium-run
    #   mountPath: /var/run/cilium
    - name: hubble-tls
      mountPath: /var/run/hubble-tls
      readOnly: true
  config: |
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:55690
      hubble:
        # NODE_IP is substituted by the collector at runtime
        # the '\' prefix is required only in order for this config to be
        # inlined in the guide and make it easy to paste, i.e. to avoid
        # shell subtituting it
        endpoint: \${NODE_IP}:4244 # unix:///var/run/cilium/hubble.sock
        buffer_size: 100
        include_flow_types:
          # this sets an L7 flow filter, removing this section will
          # disable filtering and result all types of flows being turned
          # into spans;
          # other type filters can be set, the names are same as what's
          # used in 'hubble observe -t <type>'
          traces: ["l7"]
        tls:
          insecure_skip_verify: true
          ca_file: /var/run/hubble-tls/ca.crt
          cert_file: /var/run/hubble-tls/client.crt
          key_file: /var/run/hubble-tls/client.key
    processors:
      batch:
        timeout: 30s
        send_batch_size: 100

    exporters:
      jaeger:
        endpoint: jaeger-default-collector.jaeger.svc.cluster.local:14250
        tls:
          insecure: true

    service:
      telemetry:
        logs:
          level: info
      pipelines:
        traces:
          receivers: [hubble, otlp]
          processors: [batch]
          exporters: [jaeger]
EOF

Output:

opentelemetrycollector.opentelemetry.io/otelcol-hubble created

Check that the logs are running:

$ kubectl get pod -n kube-system -l app.kubernetes.io/name=otelcol-hubble-collector

Check the logs to verify that things are collected

kubectl logs -n kube-system -l app.kubernetes.io/name=otelcol-hubble-collector

Output:

2022–08–13T21:38:39.834Z info service/telemetry.go:92 Setting up own telemetry…
2022–08–13T21:38:39.837Z info service/telemetry.go:116 Serving Prometheus metrics {“address”: “:8888”, “level”: “basic”, “service.instance.id”: “28a7926c-15e8–45e3–80af-2ab251c06269”, “service.version”: “latest”}
2022–08–13T21:38:39.837Z info service/collector.go:230 Starting otelcol-hubble… {“Version”: “0.1.0”, “NumCPU”: 4}
[..]
2022–08–13T21:38:39.481Z info service/collector.go:132 Everything is ready. Begin running and processing data.
2022–08–13T21:38:39.489Z info v3@v3.2103.1/logger.go:46 All 0 tables opened in 0s
{“kind”: “receiver”, “name”: “hubble”}
2022–08–13T21:38:39.490Z info v3@v3.2103.1/logger.go:46 Discard stats nextEmptySlot: 0
{“kind”: “receiver”, “name”: “hubble”}
2022–08–13T21:38:39.490Z info v3@v3.2103.1/logger.go:46 Set nextTxnTs to 0 {“kind”: “receiver”, “name”: “hubble”}
2022–08–13T21:38:40.479Z info jaegerexporter@v0.38.0/exporter.go:186 State of the connection with the Jaeger Collector backend {“kind”: “exporter”, “name”: “jaeger”, “state”: “READY”}

To test this out, we could either port-forward to the jaeger interface or play with the Ingress again:

INGRESSURL=jaeger-medium.ploio.net
$ cat <<EOF | kubectl apply -f -
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    cert-manager.io/cluster-issuer: prod-certificate-clusterissuer
    external-dns.alpha.kubernetes.io/hostname: ${INGRESSURL}
  labels:
    app: jaeger
  name: jaeger-ui
  namespace: jaeger
spec:
  ingressClassName: cilium
  rules:
  - host: ${INGRESSURL}
    http:
      paths:
      - backend:
          service:
            name: jaeger-default-query
            port:
              number: 16686
        path: /
        pathType: Prefix
  tls:
  - hosts:
    - ${INGRESSURL}
    secretName: jaeger-medium-ingress-tls
EOF

Output:

ingress.networking.k8s.io/jaeger-ui created

Wait a while as externalDNS is rather slow on reacting to changes and the Cert Manager certificate creation can take about 2–5 minutes, depending on the implementation, check the logs if it takes an unusual amount of time. When ready, then browse into the url with a browser of your choice:

The Jaeger UI displayed though the Ingress

From here on, the instructions here at the Cilium GH repository should be valid. As I don’t want to produce text for nothing, I leave that as a task for the reader — the result should be similar to this (I just copy/pasted the instructions from the GH unmodified):

Jager UI, when some load has been generated.

Well, that’s it for this time, I wasn’t really planning to write another “long” (?) article, but I believe that all the steps here was necessary to keep together as a whole instruction in order to not get out of context.

And no, this is not the last thing written about this cluster as I have lots of ideas for my home lab. I intend to describe how to add more components such as Longhorn and Multus and explore further into the Cilium space, but at this point the cluster should be rather functional and the concept as such are described to a great extent.

Do you appreciate these writings? Did you find any errors? Are the writings on about right level, or too shallow/simple? Many of the steps could certainly be automated or converted to a script, but then there would not be anything to write about or explain.

Connect with me on LinkedIn or comment/react below.

Build a managed Kubernetes cluster from scratch — part 5

Will this finally be the last part of this series?

External Requirements

Install Cert Manager

Lets Encrypt Certificates

Handle internal DNS records — meet ExternalDNS

Verification of Cert Manager and ExternalDNS

Deploy Hubble adaptor for OpenTelemetry

Written by Tony Norlin