Build a managed Kubernetes cluster from scratch — part 5

Will this finally be the last part of this series?

Tony Norlin
15 min readAug 14, 2022

In the previous part we built a cluster that handles Ingress resources, and although it would be possible (with some manual labour) to generate a certificate and declare a valid DNS record that points to the Ingress, it makes sense to have them handled automatically by a functions such as Cert Manager.

A figure of the different components that are in separate network segments.
A figure of the different components that are in separate network segments.

External Requirements

  • A DNS server/provider that can handle Lets Encrypt Challenges
  • An external load balancer
  • A DNS server that handles internal resolving of cluster resources

Implementing an external Load Balancer

Some deployments (such as Cert Manager) deploys Admission Controllers to leverage Control Plane functionality outside of the API Server by intercepting the API calls with Webhooks (HTTP callbacks).

The API call is sent from the client by a request to the API Server, which in turn need to route the call (back) to the Worker node where the deployed Admission Controller (Webhook) is running. The request is encrypted by TLS and sent to the defined resource, normally to the defined Service within selected Namespace — as we are running the Data Plane outside of the Control Plane, we need to help the API Server to reach the deployment by putting a load balancer in front of the worker nodes and point the deployment to make the call to a DNS name instead.

The choice of load balancer is a matter of taste as we are not relying on any advanced features — if there already exists one in the network, it can be used instead. I chose to have a dedicated (on a separate VLAN, for control) to this cluster as getting HAProxy up and running in a pkgsrc branded zone is very resource friendly:

$ pkgin install haproxy-2.X.YUSERNAME=$(openssl rand -hex 16)
PASSWORD=$(openssl rand -base64 16)
$ cat << EOF > /opt/local/etc/haproxy.cfg
defaults
maxconn 20000
mode tcp
option dontlognull
listen stats
bind *:8443
mode http
stats enable
stats uri /stats
stats refresh 10s
stats auth $USERNAME:$PASSWORD
stats admin if TRUE
frontend certmanager-webhook
bind *:443
mode tcp
option tcplog
default_backend certmanager-wh-backend
backend certmanager-wh-backend
mode tcp
option ssl-hello-chk
server worker1 10.200.0.1:10256 check inter 1s verify none
server worker2 10.200.0.2:10256 check inter 1s verify none
server worker3 10.200.0.3:10256 check inter 1s verify none
frontend otel-webhook
bind *:444
mode tcp
option tcplog
default_backend otel-wh-backend
backend otel-wh-backend
mode tcp
option ssl-hello-chk
server worker1 10.200.0.1:10258 check inter 1s verify none
server worker2 10.200.0.2:10258 check inter 1s verify none
server worker3 10.200.0.3:10258 check inter 1s verify none
EOF
$ pfexec svcadm enable -r svc:/pkgsrc/haproxy:default

The health check for the load balancer will fail, due to nothing listening on port 10256 (yet) on the worker nodes — this is expected.

Preparing CoreDNS

The current state of CoreDNS has served us well for internal requests within the cluster, but as we now grow the functionality with external verifiers we also need to be able to resolve remote destinations and we do it by adding one line: forward . /etc/resolv.conf .

This enables CoreDNS to resolve entries upstream and we will apply it to the deployment and reload the configuration (beware to end any line with a blank space (⌴) as it will make the parser confused and remove the formatting):

$ cat <<EOF | kubectl apply -f -
---
apiVersion: v1
data:
Corefile: |
.:53 {
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
fallthrough in-addr.arpa ip6.arpa
}
prometheus :9153
forward . /etc/resolv.conf
cache 30
loop
reload
loadbalance
}
kind: ConfigMap
metadata:
name: coredns
namespace: kube-system
EOF
$ kubectl -n kube-system rollout restart deployment coredns

Install Cert Manager

The method chosen here to install Cert Manager is with helm, and we do it by adding their repository and issue an update command as described by the official instructions (but seem unecessary if its the first time adding, as it will automatically add it updated):

$ helm repo add jetstack https://charts.jetstack.io$ helm repo update

Keep in mind that the URL set up must be known by, at a minimum, the kube-apiserver — but it would probably be wise to have it as a local DNS entry and this should point to the external loadbalancer (HAProxy) that we previously implemented :


VERSION=v1.9.1
EXTLB=extlbvip.medium.site # A FQDN pointing to the HAProxy
PORT=10256 # The same port number as defined above in the LB
helm install cert-manager jetstack/cert-manager --namespace cert-manager --create-namespace --version ${VERSION} --set installCRDs=true --set webhook.url.host=${EXTLB} --set webhook.securePort=${PORT} --set webhook.hostNetwork=true >> helm-cert-manager-hostFor now, ignore http: TLS handshake error from <load balancer IP>:57491: tls: client offered only unsupported versions: []

Lets Encrypt Certificates

Lets Encrypt DNS Challenge

There are two ways to complete the certificate request challenge — HTTP01 and DNS01. In my opinion, the DNS01 option is to prefer security-wise, as the HTTP01 needs to expose (and no, there is no “known” IP where the challenge response is initiated from) a HTTP daemon for the challenge to complete. One might say that another WAN link could be exposed for that purpose only, that’s true but to little or no much benefit. I prefer the DNS01 challenge and that is what will be described here.

As for the DNS challenge, the method to actually solve the challenge depends on the provider that manages the domain name’s DNS records or, in case that you are selfserving the domain name, at a DNS provider. This in turn will decide the actual method and it is described here. From my experience the Webhook option is subpar (an emergency solution for those provider that failed to implement a good API) and a solution that will require extra steps to get working with an external Control Plane.

Either way, the domain name you order certificates for needs to be in your possession and pointed to to where you creates the challenge for Lets Encrypt.

I have tested the CloudFlare and RFC2136 methods and they are very similar to each other.

Cloud Flare example (described in detail here) from my earlier design:

EMAILADDRESS=<email@address.here>
APIKEY=(from Profile>API Tokens>API Keys>Global API Key > View)
$ kubectl create secret generic cloudflare-api-key-secret --from-literal=api-key=${APIKEY}$ cat <<EOF | kubectl apply -f -
---
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
annotations:
name: cloudflare-issuer-production
spec:
acme:
email: ${EMAILADDRESS}
preferredChain: ""
privateKeySecretRef:
name: cloudflare-issuer-production-account-key
server: https://acme-v02.api.letsencrypt.org/directory
solvers:
- dns01:
cloudflare:
apiTokenSecretRef:
key: api-key
name: cloudflare-api-key-secret
email: ${EMAILADDRESS}
EOF

Cloud Flare worked seamless, but I decided that I wanted more control, and as I already host a primary DNS it felt natural to instead choose RFC2136:

EMAILADDRESS=<email@address.here> # To register at Lets Encrypt
NAMESERVER=10.53.0.1 # Enter the IP of primary DNS
TSIGALGO=HMACSHA512 # Enter the one chosen
TSIGNAME=certmanager # Enter the one chosen
TSIGKEY=<enter the secret key generated> # Hint, look in named.conf
kubectl create secret generic tsig-certmanager --from-literal=${TSIGNAME}=${TSIGKEY}$ cat <<EOF | kubectl apply -f -
---
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
generation: 1
name: prod-certificate-clusterissuer
spec:
acme:
email: ${EMAILADDRESS}
preferredChain: ""
privateKeySecretRef:
name: prod-certificate-clusterissuer-keyref
server: https://acme-v02.api.letsencrypt.org/directory
solvers:
- dns01:
rfc2136:
nameserver: ${NAMESERVER}
tsigAlgorithm: ${TSIGALGO}
tsigKeyName: ${TSIGNAME}
tsigSecretSecretRef:
key: ${TSIGNAME}
name: tsig-certmanager
EOF

Output:

clusterissuer.cert-manager.io/prod-certificate-clusterissuer created

Verify that the ClusterIssuer was registered OK at the Issuer (the Status output):

$ kubectl describe clusterissuer
Name: prod-certificate-clusterissuer
Namespace:
Labels: <none>
Annotations: <none>
API Version: cert-manager.io/v1
Kind: ClusterIssuer
Metadata:
Creation Timestamp: 2022-08-12T22:18:22Z
Generation: 1
[...]
Manager: cert-manager-clusterissuers
Operation: Update
Subresource: status
Time: 2022-08-12T22:18:23Z
Resource Version: 1366968
UID: 18209f74-cbd5-4789-bba5-3e0a5d4d2c30
Spec:
Acme:
Email: [.. email address here ..]
Preferred Chain:
Private Key Secret Ref:
Name: prod-certificate-clusterissuer-keyref
Server: https://acme-v02.api.letsencrypt.org/directory
Solvers:
dns01:
rfc2136:
Nameserver: 10.53.0.1
Tsig Algorithm: HMACSHA512
Tsig Key Name: certmanager
Tsig Secret Secret Ref:
Key: certmanager
Name: tsig-certmanager
Status:
Acme:
Last Registered Email: [.. email address here ..]
Uri: https://acme-v02.api.letsencrypt.org/acme/acct/674725597
Conditions:
Last Transition Time: 2022-08-12T22:18:23Z
Message: The ACME account was registered with the ACME server
Observed Generation: 1
Reason: ACMEAccountRegistered
Status: True
Type: Ready
Events: <none>

Now, if everything goes as intended, the environment is ready to create a valid certificate (make sure that the kube-apiserver can reach the DNS server to create the challenge TXT record), a sample certificate:

DNSNAME=medium-site.ploio.net # The FQDN to order the certificate for
SECRET=medium-site-tls-cert # Name of Secret to put certificate in
$ cat <<EOF | kubectl apply -f -
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: ${DNSNAME}
spec:
commonName: ${DNSNAME}
dnsNames:
- ${DNSNAME}
issuerRef:
name: prod-certificate-clusterissuer
kind: ClusterIssuer
secretName: ${SECRET}
EOF

Output:

certificate.cert-manager.io/medium-site.ploio.net created

Various methods for inspect the progress:

$ kubectl get certificaterequest
NAME APPROVED DENIED READY ISSUER REQUESTOR AGE
medium-site.ploio.net-rmlgh True False prod-certificate-clusterissuer system:serviceaccount:cert-manager:cert-manager 4s
$ kubectl describe certificaterequest medium-site.ploio.net-rmlgh
Name: medium-site.ploio.net-rmlgh
Namespace: default
Labels: <none>
Annotations: cert-manager.io/certificate-name: medium-site.ploio.net
cert-manager.io/certificate-revision: 1
cert-manager.io/private-key-secret-name: medium-site.ploio.net-9npg2
API Version: cert-manager.io/v1
Kind: CertificateRequest
Metadata:
Creation Timestamp: 2022-08-13T09:31:38Z
Generate Name: medium-site.ploio.net-
Generation: 1
[ ... ]
Status:
Conditions:
Last Transition Time: 2022-08-13T09:31:38Z
Message: Certificate request has been approved by cert-manager.io
Reason: cert-manager.io
Status: True
Type: Approved
Last Transition Time: 2022-08-13T09:31:38Z
Message: Waiting on certificate issuance from order default/medium-site.ploio.net-rmlgh-3621970007: "pending"
Reason: Pending
Status: False
Type: Ready
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal cert-manager.io 11s cert-manager-certificaterequests-approver Certificate request has been approved by cert-manager.io
Normal OrderCreated 11s cert-manager-certificaterequests-issuer-acme Created Order resource default/medium-site.ploio.net-rmlgh-3621970007
Normal OrderPending 11s cert-manager-certificaterequests-issuer-acme Waiting on certificate issuance from order default/medium-site.ploio.net-rmlgh-3621970007: ""
$ kubectl get challenge
NAME STATE DOMAIN AGE
medium-site.ploio.net-rmlgh-3621970007-1348735329 pending medium-site.ploio.net 24s

$ kubectl describe challenge medium-site.ploio.net-rmlgh-3621970007-1348735329
Name: medium-site.ploio.net-rmlgh-3621970007-1348735329
Namespace: default
Labels: <none>
Annotations: <none>
API Version: acme.cert-manager.io/v1
Kind: Challenge
Metadata:
Creation Timestamp: 2022-08-13T09:31:39Z
Finalizers:
finalizer.acme.cert-manager.io
Generation: 1
[...]
Status:
Presented: true
Processing: true
Reason: Waiting for DNS-01 challenge propagation: DNS record for "medium-site.ploio.net" not yet propagated
State: pending
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Started 32s cert-manager-challenges Challenge scheduled for processing
Normal Presented 31s cert-manager-challenges Presented challenge using DNS-01 challenge mechanism

$ kubectl describe certificate medium-site.ploio.net
Name: medium-site.ploio.net
Namespace: default
Labels: <none>
Annotations: <none>
API Version: cert-manager.io/v1
Kind: Certificate
Metadata:
Creation Timestamp: 2022-08-13T09:31:37Z
Generation: 1
[...]
Status:
Conditions:
Last Transition Time: 2022-08-13T09:33:49Z
Message: Certificate is up to date and has not expired
Observed Generation: 1
Reason: Ready
Status: True
Type: Ready
Not After: 2022-11-11T08:33:47Z
Not Before: 2022-08-13T08:33:48Z
Renewal Time: 2022-10-12T08:33:47Z
Revision: 1
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Issuing 17m cert-manager-certificates-trigger Issuing certificate as Secret does not exist
Normal Generated 17m cert-manager-certificates-key-manager Stored new private key in temporary Secret resource "medium-site.ploio.net-9npg2"
Normal Requested 17m cert-manager-certificates-request-manager Created new CertificateRequest resource "medium-site.ploio.net-rmlgh"
Normal Issuing 15m cert-manager-certificates-issuing The certificate has been successfully issued

During the challenge it should also be possible to view the challenge in the public DNS responsible for the domain:

_acme-challenge.medium-site.ploio.net    TXT    60   -   "[.. challenge here ..]"

As the certificate issued correctly and served its purpose, it can now be removed.

Handle internal DNS records — meet ExternalDNS

In order to let Kubernetes create DNS records to automatically be resolvable by all clients, there is a deployment called ExternalDNS that fixes this part, so lets install that one as well with Helm:

helm repo add external-dns https://kubernetes-sigs.github.io/external-dns/

After you've installed the repo you can install the chart.

PROVIDER=rfc2136
DNSHOST=10.53.0.2 # Your internal DNS resolver
DNSZONE=ploio.net # The zone that DNS manages
TSIGSECRET=<TSIG Secret at the DNS server, check named.conf>
TSIGALGO=hmac-sha256 # TSIG algorithm chosen at DNS server
TSIGKEY=externaldns # The TSIG name chosen at DNS server
DOMAINFILTER=ploio.net # Which sub domains the ExternalDNS handles
$ cat <<EOF | helm upgrade --install -n external-dns external-dns \
external-dns/external-dns --create-namespace -f -
---
serviceAccount:
create: true

rbac:
create: true

securityContext:
runAsNonRoot: true
runAsUser: 65534
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: ["ALL"]

sources:
- service
- ingress

registry: txt
txtOwnerId: "k8s"
txtPrefix: "external-dns-"

domainFilters:
- ${DOMAINFILTER}

provider: ${PROVIDER}

deploymentStrategy:
type: Recreate

extraArgs:
- --rfc2136-host=${DNSHOST}
- --rfc2136-port=53
- --rfc2136-zone=${DNSZONE}
- --rfc2136-tsig-secret=${TSIGSECRET}
- --rfc2136-tsig-secret-alg=${TSIGALGO}
- --rfc2136-tsig-keyname=${TSIGKEY}
- --rfc2136-tsig-axfr
EOF

Check in the external-dns namespace so that the pod is running, also check the logs for eventuall issues with communication against the internal DNS.

Verification of Cert Manager and ExternalDNS

With the pieces in place, we can patch the Hubble Ingress resource and test everything out. First, patch the Ingress to enable TLS and the desired DNS entry:

INGRESSURL=hubble-medium-site.ploio.net
$ cat <<EOF |kubectl patch -n kube-system ingress hubble-ui \
--patch-file=/dev/stdin
{
"spec": {
"tls": [
{
"hosts": [
"${INGRESSURL}"
],
"secretName": "hubble-ingress-tls"
}
]

}
}
EOF

Output:

ingress.networking.k8s.io/hubble-ui patched

Then, annotate (or edit if that is to prefer) the Ingress with the ClusterIssuer that we created:

$ kubectl -n kube-system annotate ingress hubble-ui \
cert-manager.io/cluster-issuer=prod-certificate-clusterissuer

Output:

ingress.networking.k8s.io/hubble-ui annotated

Similar to earlier described, this triggers the CertificateRequest, the Challenge and so on.. finally a Certificate should have been issued.

Now, to have the Load Balancer IP of the Ingress resource automatically created in the internal DNS, make another annotation like this:

$ kubectl -n kube-system annotate ingress \
hubble-ui external-dns.alpha.kubernetes.io/hostname=hubble-medium-site.ploio.net

Output:

ingress.networking.k8s.io/hubble-ui annotated

The logs at the external-dns pod should show something similar to this:

[..]
time="2022-08-13T15:18:02Z" level=info msg="Adding RR: hubble-medium-site.ploio.net 0 A 10.254.254.3"
time="2022-08-13T15:18:02Z" level=info msg="Adding RR: external-dns-hubble-medium-site.ploio.net 0 TXT \"heritage=external-dns,external-dns/owner=k8s,external-dns/resource=ingress/kube-system/hubble-ui\""
time="2022-08-13T15:18:02Z" level=info msg="Adding RR: external-dns-a-hubble.medium-site.ploio.net 0 TXT \"heritage=external-dns,external-dns/owner=k8s,external-dns/resource=ingress/kube-system/hubble-ui\""
time="2022-08-13T15:19:02Z" level=info msg="All records are already up to date"
time="2022-08-13T15:20:02Z" level=info msg="All records are already up to date"
Hubble UI, viewed through the integrated Ingress controller (envoy).
Hubble UI, viewed through the integrated Ingress controller (envoy).

Deploy Hubble adaptor for OpenTelemetry

With the Cilium v1.12 release there was also announced Tracing with OpenTelemetry. According to the instructions found here we should kubectl against a kustomize base definition — this requires that the git command is in the PATH:

$ kubectl apply -k github.com/cilium/kustomize-bases/jaeger

Output:

namespace/jaeger created
customresourcedefinition.apiextensions.k8s.io/jaegers.jaegertracing.io created
serviceaccount/jaeger-operator created
role.rbac.authorization.k8s.io/jaeger-operator created
clusterrole.rbac.authorization.k8s.io/jaeger-operator created
rolebinding.rbac.authorization.k8s.io/jaeger-operator created
clusterrolebinding.rbac.authorization.k8s.io/jaeger-operator created
deployment.apps/jaeger-operator created

$ cat <<EOF | kubectl apply -f -
---
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
name: jaeger-default
namespace: jaeger
spec:
strategy: allInOne
storage:
type: memory
options:
memory:
max-traces: 100000
ingress:
enabled: false
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ""
EOF

Output:

jaeger.jaegertracing.io/jaeger-default created

kubectl apply -k github.com/cilium/kustomize-bases/opentelemetry

Output:

namespace/opentelemetry-operator-system created
customresourcedefinition.apiextensions.k8s.io/opentelemetrycollectors.opentelemetry.io created
serviceaccount/opentelemetry-operator-controller-manager created
role.rbac.authorization.k8s.io/opentelemetry-operator-leader-election-role created
clusterrole.rbac.authorization.k8s.io/opentelemetry-operator-manager-role created
clusterrole.rbac.authorization.k8s.io/opentelemetry-operator-metrics-reader created
clusterrole.rbac.authorization.k8s.io/opentelemetry-operator-proxy-role created
rolebinding.rbac.authorization.k8s.io/opentelemetry-operator-leader-election-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/opentelemetry-operator-manager-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/opentelemetry-operator-proxy-rolebinding created
service/opentelemetry-operator-controller-manager-metrics-service created
service/opentelemetry-operator-webhook-service created
deployment.apps/opentelemetry-operator-controller-manager created
certificate.cert-manager.io/opentelemetry-operator-serving-cert created
issuer.cert-manager.io/opentelemetry-operator-selfsigned-issuer created
mutatingwebhookconfiguration.admissionregistration.k8s.io/opentelemetry-operator-mutating-webhook-configuration created
validatingwebhookconfiguration.admissionregistration.k8s.io/opentelemetry-operator-validating-webhook-configuration created

As can be seen here, two Admission Controllers was created and we need to change both CRDs from service service to url schemes, from:

[..]
service:
name: opentelemetry-operator-webhook-service
namespace: opentelemetry-operator-system
path: /validate-opentelemetry-io-v1alpha1-opentelemetrycollector
port: 443
[..]

to the following stanza, where EXTLB is the same url as previously declared to the other webhook (external load balancer/HAProxy):


[..]
url: https://${EXTLB}:444/validate-opentelemetry-io-v1alpha1-opentelemetrycollector
[..]

Do this on both thevalidatingwebhookconfiguration and the mutatingwebhookconfiguration- it should be on two occurances on each webhook configuration. The reason for this adaption is the same as earlier, as our kube-apiserver is external from a Data Plane point of view it is necessary to publish an address that is known to the Control Plane.

Edit the certificate

$ kubectl -n opentelemetry-operator-system edit certificate opentelemetry-operator-serving-cert

In dnsNames, add the url to the HAProxy to have the certificate regenerated

EXTLB=extlbvip.medium.site # A FQDN pointing to the HAProxy
$ kubectl -n opentelemetry-operator-system patch certificate opentelemetry-operator-serving-cert --type='json' -p='[{"op": "add", "path": "/spec/dnsNames/-","value":"${EXTLB}"}]}]'

Patch the deployment to listen on the hostport 10258 (as defined in HAProxy), then restart the deployment to read in the new configuration:

$ kubectl -n opentelemetry-operator-system patch deployments opentelemetry-operator-controller-manager --type='json' -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/ports/0", "value": {"containerPort":9443,"hostPort":10258,"name":"webhook-server","protocol":"TCP"}}]'

Output:

deployment.apps/opentelemetry-operator-controller-manager patched

$ cat <<EOF | kubectl apply -f -
---
apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
name: otelcol-hubble
namespace: kube-system
spec:
mode: daemonset
image: ghcr.io/cilium/hubble-otel/otelcol:v0.1.1
env:
# set NODE_IP environment variable using downwards API
- name: NODE_IP
valueFrom:
fieldRef:
fieldPath: status.hostIP
volumes:
# this example connect to Hubble socket of Cilium agent
# using host port and TLS
- name: hubble-tls
projected:
defaultMode: 256
sources:
- secret:
name: hubble-relay-client-certs
items:
- key: tls.crt
path: client.crt
- key: tls.key
path: client.key
- key: ca.crt
path: ca.crt
# it's possible to use the UNIX socket also, for which
# the following volume will be needed
# - name: cilium-run
# hostPath:
# path: /var/run/cilium
# type: Directory
volumeMounts:
# - name: cilium-run
# mountPath: /var/run/cilium
- name: hubble-tls
mountPath: /var/run/hubble-tls
readOnly: true
config: |
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:55690
hubble:
# NODE_IP is substituted by the collector at runtime
# the '\' prefix is required only in order for this config to be
# inlined in the guide and make it easy to paste, i.e. to avoid
# shell subtituting it
endpoint: \${NODE_IP}:4244 # unix:///var/run/cilium/hubble.sock
buffer_size: 100
include_flow_types:
# this sets an L7 flow filter, removing this section will
# disable filtering and result all types of flows being turned
# into spans;
# other type filters can be set, the names are same as what's
# used in 'hubble observe -t <type>'
traces: ["l7"]
tls:
insecure_skip_verify: true
ca_file: /var/run/hubble-tls/ca.crt
cert_file: /var/run/hubble-tls/client.crt
key_file: /var/run/hubble-tls/client.key
processors:
batch:
timeout: 30s
send_batch_size: 100

exporters:
jaeger:
endpoint: jaeger-default-collector.jaeger.svc.cluster.local:14250
tls:
insecure: true

service:
telemetry:
logs:
level: info
pipelines:
traces:
receivers: [hubble, otlp]
processors: [batch]
exporters: [jaeger]
EOF

Output:

opentelemetrycollector.opentelemetry.io/otelcol-hubble created

Check that the logs are running:

$ kubectl get pod -n kube-system -l app.kubernetes.io/name=otelcol-hubble-collector

Check the logs to verify that things are collected

kubectl logs -n kube-system -l app.kubernetes.io/name=otelcol-hubble-collector

Output:

2022–08–13T21:38:39.834Z info service/telemetry.go:92 Setting up own telemetry…
2022–08–13T21:38:39.837Z info service/telemetry.go:116 Serving Prometheus metrics {“address”: “:8888”, “level”: “basic”, “service.instance.id”: “28a7926c-15e8–45e3–80af-2ab251c06269”, “service.version”: “latest”}
2022–08–13T21:38:39.837Z info service/collector.go:230 Starting otelcol-hubble… {“Version”: “0.1.0”, “NumCPU”: 4}
[..]
2022–08–13T21:38:39.481Z info service/collector.go:132 Everything is ready. Begin running and processing data.
2022–08–13T21:38:39.489Z info v3@v3.2103.1/logger.go:46 All 0 tables opened in 0s
{“kind”: “receiver”, “name”: “hubble”}
2022–08–13T21:38:39.490Z info v3@v3.2103.1/logger.go:46 Discard stats nextEmptySlot: 0
{“kind”: “receiver”, “name”: “hubble”}
2022–08–13T21:38:39.490Z info v3@v3.2103.1/logger.go:46 Set nextTxnTs to 0 {“kind”: “receiver”, “name”: “hubble”}
2022–08–13T21:38:40.479Z info jaegerexporter@v0.38.0/exporter.go:186 State of the connection with the Jaeger Collector backend {“kind”: “exporter”, “name”: “jaeger”, “state”: “READY”}

To test this out, we could either port-forward to the jaeger interface or play with the Ingress again:

INGRESSURL=jaeger-medium.ploio.net
$ cat <<EOF | kubectl apply -f -
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
cert-manager.io/cluster-issuer: prod-certificate-clusterissuer
external-dns.alpha.kubernetes.io/hostname: ${INGRESSURL}
labels:
app: jaeger
name: jaeger-ui
namespace: jaeger
spec:
ingressClassName: cilium
rules:
- host: ${INGRESSURL}
http:
paths:
- backend:
service:
name: jaeger-default-query
port:
number: 16686
path: /
pathType: Prefix
tls:
- hosts:
- ${INGRESSURL}
secretName: jaeger-medium-ingress-tls
EOF

Output:

ingress.networking.k8s.io/jaeger-ui created

Wait a while as externalDNS is rather slow on reacting to changes and the Cert Manager certificate creation can take about 2–5 minutes, depending on the implementation, check the logs if it takes an unusual amount of time. When ready, then browse into the url with a browser of your choice:

The Jaeger UI displayed though the Ingress
The Jaeger UI displayed though the Ingress

From here on, the instructions here at the Cilium GH repository should be valid. As I don’t want to produce text for nothing, I leave that as a task for the reader — the result should be similar to this (I just copy/pasted the instructions from the GH unmodified):

Jager UI, when some load has been generated.
Jager UI, when some load has been generated.

Well, that’s it for this time, I wasn’t really planning to write another “long” (?) article, but I believe that all the steps here was necessary to keep together as a whole instruction in order to not get out of context.

And no, this is not the last thing written about this cluster as I have lots of ideas for my home lab. I intend to describe how to add more components such as Longhorn and Multus and explore further into the Cilium space, but at this point the cluster should be rather functional and the concept as such are described to a great extent.

Do you appreciate these writings? Did you find any errors? Are the writings on about right level, or too shallow/simple? Many of the steps could certainly be automated or converted to a script, but then there would not be anything to write about or explain.

Connect with me on LinkedIn or comment/react below.

--

--

Tony Norlin

Homelab tinkerer. ❤ OpenSource. illumos, Linux, kubernetes, networking and security. Recently eBPF. Connect with me: www.linkedin.com/in/tonynorlin