Apply ingress-gce controller for self-managed K8S on hybrid-cloud

Published in

DeepQ Research Engineering Blog

7 min readNov 17, 2023

Introduction

In general case we will run k8s service on GKE (Google Kubernetes Engine) and deploy an Ingress rule to automatically create the load balancer resources. However, if the k8s cluster is self-managed to support multi-cloud or hybrid cloud-onprem scenario, we can consider to use ingress-nginx controller but it is obviously not scalable.

Fortunately, Google opens the implementation of ingress-gce controller that implements ingress controller for GKE, we can leverage it to do the load balancing on nodes located in GCE (Google Compute Engine). However, the documentation of ingress-gce is suck, and the repository is poorly maintained. Today this tutorial is the first and only one on internet to apply ingress-gce step-by-step on the self-managed k8s cluster which contains nodes located on GCE.

Architecture

Let’s say there is a self-managed cluster, control-plane node is hosted by a GCE VM, and join a worker node from another network (Use Cloud VPN tunnel to connect Google VPC and local LAN). Since Ingress controller on GCP is only natively supported by GKE, we need to manually deploy the ingress-gce controller to enable the Cloud Load Balancer.

The target is to establish the load balancer, whose backend service is the instance group, which contains the GCE nodes in the cluster (On-prem nodes cannot be routed by GCP load balancer). Ingress traffic is received by NodePort service, and routed to corresponding pods in the cluster.

Prerequisite

Nodes OS: Ubuntu 20.04 LTS

K8s Versin: v1.26.0

Ingress-gce Version: v1.25.1

Build OS: Mac OS 10.14.6

Build gcloud SDK: 418.0.0

Setup Procedure

1. Check GLBC Version

Check below table to know which version of GLBC (Google Load Balancer Controller) you need to build. Since data is out-dated, so I use the latest of GLBC version v1.25.1 in 2023/11.

   * 1.12.7-gke.16+ -> v1.5.2
   * 1.13.7-gke.5+ -> v1.6.0
   * 1.14.10-gke.31+ -> 1.6.2
   * 1.14.10-gke.42+ -> 1.6.4
   * 1.15.4-gke.21+ -> 1.7.2
   * 1.15.9-gke.22+ -> 1.7.3
   * 1.15.11-gke.15+ -> 1.7.4
   * 1.15.12-gke.3+ -> 1.7.5
   * 1.16.8-gke.3+ -> 1.9.1
   * 1.16.8-gke.12+ -> 1.9.2
   * 1.16.9-gke.2+ -> 1.9.3
   * 1.16.10-gke.6+ -> 1.9.7
   * 1.17.6-gke.11+ -> 1.9.7
   * 1.18.4-gke.1201+ -> 1.9.7
   * 1.16.13-gke.400+ -> 1.9.8
   * 1.17.9-gke.600+ -> 1.9.8
   * 1.18.6-gke.500+ -> 1.9.8
   * 1.18.6-gke.4800+ -> 1.9.9
   * 1.18.10-gke.1500+ -> 1.10.8
   * 1.18.10-gke.2300+ -> 1.10.9
   * 1.18.12-gke.1200+ -> 1.10.13
   * 1.18.18-gke.1200+ -> 1.10.15
   * 1.18.19-gke.1400+ -> 1.11.1
   * 1.18.20-gke.5100+ -> 1.11.5
   * 1.19.14-gke.1900 -> 1.11.5
   * 1.20.10-gke.301 -> 1.11.5
   * 1.21.3-gke.210 -> 1.13.4

2. Build the GLBC

Though there are pre-built docker images such like registry.k8s.io/ingress-gce-glbc-amd64:v1.8.0 , but almost latest versions are not maintained. So still suggest build the image by ourselves:

# Download the desired version of ingress-gce
VERSION_TAG=v1.25.1
git clone https://github.com/kubernetes/ingress-gce.git
cd ingress-gce
git checkout ${VERSION_TAG} -b ${VERSION_TAG}

# Manually modify build script to reduce VCS error.
vim build/build.sh
+export GOFLAGS=-buildvcs=false # add this line
 export CGO_ENABLED=0
 export GOARCH="${ARCH}"
 export GOOS="${OS}"

# Manually modify makefile to reduce gcloud CLI error.
vim build/rules.mk
# replace this line to below lines.
-@gcloud docker -- push $$(head -n 1 $<) $(VERBOSE_OUTPUT)
+gcloud auth configure-docker
+docker push $$(head -n 1 $<) $(VERBOSE_OUTPUT)

# make and push the built docker image to your desired docker registry.
REGISTRY="gcr.io/${PROJECT_ID}" make only-push-glbc

# now keep the tag of your built docker image
TAG="gcr.io/your-project-id/ingress-gce-glbc-amd64:v1.25.1-dirty"

3. Edit the gce.conf

Before deploying the ingress-gce controller, we need to fill in the configuration to tell the controller where to create the load balancer.

cd ingress-gce/docs/deploy/gke/non-gcp

# edit the config file.
vim gce.conf
 token-url = nil
 # remove this line, it is GA now.
-api-endpoint = https://www.googleapis.com/compute/alpha/
 # The project your cluster registered to
 project-id = [PROJECT]
 # The network your cluster will be peering with
 network-name =  [NETWORK]
 # Closest GCP zone to your cluster
 local-zone = [ZONE]

4. Edit the k8s yamls

Since the document is out-dated, we need to manual modify rbac.yaml and glbc.yaml for the controller to normally run, following is the modification:

cd ingress-gce/docs/deploy/gke/non-gcp

# edit the rbac to add more definations
vim rbac.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: system:controller:glbc
rules:
# add below rules to this ClusterRole
- apiGroups: [""]
  resources: ["services/status"]
  verbs: ["patch"]
- apiGroups: ["extensions", "networking.k8s.io"]
  resources: ["ingresses/status"]
  verbs: ["patch"]
- apiGroups: ["networking.gke.io"]
  resources: ["frontendconfigs"]
  verbs: ["get", "list", "watch", "update", "create", "patch"]
- apiGroups: ["networking.gke.io"]
  resources: ["servicenetworkendpointgroups","gcpingressparams"]
  verbs: ["get", "list", "watch", "update", "create", "patch", "delete"]
- apiGroups: ["networking.k8s.io"]
  resources: ["ingressclasses"]
  verbs: ["get", "list", "watch", "update", "create", "patch"]
- apiGroups: ["coordination.k8s.io"]
  resources: ["leases"]
  verbs: ["*"]
- apiGroups: ["discovery.k8s.io"]
  resources: ["endpointslices"]
  verbs: ["get", "list", "watch"]

# edit the glbc.yaml
containers:
  - image: registry.k8s.io/ingress-gce-glbc-amd64:v1.8.0
  # change the GLBC image you built in step2.
  - image: gcr.io/your-project-id/ingress-gce-glbc-amd64:v1.25.1-dirty

5. Create GCP service account

Now follow the official tutorial to create GCP service account and create its secret for k8s.

# create a service account
gcloud iam service-accounts create glbc-service-account \
  --display-name "Service Account for GLBC" --project $PROJECT

# binding compute.admin role to the service account
gcloud projects add-iam-policy-binding $PROJECT \
  --member serviceAccount:glbc-service-account@${PROJECT}.iam.gserviceaccount.com \
  --role roles/compute.admin

# Create key for glbc-service-account.
gcloud iam service-accounts keys create key.json --iam-account \
  glbc-service-account@${PROJECT}.iam.gserviceaccount.com

# Store the key as a secret in k8s. The secret will be mounted as a volume in
# glbc.yaml.
kubectl create secret generic glbc-gcp-key --from-file=key.json -n kube-system

rm key.json

6. Label the nodes

Since GLBC will try to find each node in cluster and assign to ig (instance group) or NEG (Network Endpoint Group) backend, we need to manually label the GCE node and non-GCE node:

# remove the exclude label for gce nodes (control-plane is excluded by default)
kubectl label nodes ${gce-node} node.kubernetes.io/exclude-from-external-load-balancers-
# add the exclude label for non-gce nodes.
kubectl label nodes ${non-gce-node} node.kubernetes.io/exclude-from-external-load-balancers=true
# add zone labels for all nodes (including on-prem nodes)
kubectl label nodes ${non-gce-node} topology.kubernetes.io/zone=${zone}

7. Deploy the controller to k8s

# Grant permission to current GCP user to create new k8s ClusterRoles.
kubectl create clusterrolebinding one-binding-to-rule-them-all \
  --clusterrole=cluster-admin \
  --user=$(gcloud config list --project $PROJECT --format 'value(core.account)' 2>/dev/null)

kubectl create -f rbac.yaml

# put the gce.conf to configmap
kubectl create configmap gce-config --from-file=gce.conf -n kube-system

# apply the k8s yamls.
kubectl create -f default-http-backend.yaml
kubectl create -f glbc.yaml

8. Check the status of deployments

# check controller and default-backend are running.
kubectl get pod -n kube-system
NAME                                   READY   STATUS             RESTARTS        AGE
l7-default-backend-f5cdf9d5b-rhmdw     1/1     Running            0               3d19h
l7-lb-controller-8959cc478-dbmx9       1/1     Running            0               39h

# check the default backend service is serving 80 with NodePort
kubectl get svc -n kube-system
NAME                   TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                  AGE
default-http-backend   NodePort    10.98.201.232    <none>        80:12821/TCP             3d19h

9. Reserve the static IP

Ingress rule needs to define the name of static IP, so we need to create it before applying the Ingress rule.

gcloud compute addresses create lb-ip --global
gcloud compute addresses describe lb-ip --global

10. Deploy our application and Ingress rules

Now we can try to deploy our web application to test the ingress.

# nginx.yaml
apiVersion: v1
kind: Pod
metadata:
  name: nginx
  labels:
    app.kubernetes.io/name: proxy
spec:
  containers:
  - name: nginx
    image: nginx:stable
    ports:
      - containerPort: 80
        name: http-web-svc
---
apiVersion: v1
kind: Service
metadata:
  name: nginx-service
spec:
  type: NodePort
  selector:
    app.kubernetes.io/name: proxy
  ports:
  - protocol: TCP
    port: 80
    targetPort: http-web-svc

# ingress.yaml
apiVersion: networking.k8s.io/v1
kind: IngressClass
metadata:
  annotations:
    ingressclass.kubernetes.io/is-default-class: "true"
  name: gce
spec:
  controller: k8s.io/ingress-gce
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: myingress
  namespace: default
  annotations:
    kubernetes.io/ingress.class: "gce"
    kubernetes.io/ingress.global-static-ip-name: "lb-ip"
spec:
  defaultBackend:
    service:
      name: nginx-service
      port:
        number: 80

Now we can apply above yamls:

kubectl apply -f nginx.yaml
kubectl apply -f ingress.yaml

11. Check the status of Ingress

kubectl get ing
NAME         CLASS   HOSTS                   ADDRESS      PORTS     AGE
myingress    None                                         80, 443   33h

kubectl describe ing myingress
Events:
  Type     Reason     Age                    From                     Message
  ----     ------     ----                   ----                     -------
  Normal   Sync       2m58s (x207 over 33h)  loadbalancer-controller  Scheduled for sync

# if something abnormal, please check the log of controller
kubectl logs l7-lb-controller-8959cc478-dbmx9 -n kube-system

# try to curl our nginx application
curl http://${static_ip}:80/

Results and Evaluation

After the step-by-step setup, the HTTPs load balancer will be created after 3~4 mins the Ingress rule is applied, and we can see the network topology of it like:

We can see the on-prem node will not be routed by load balancer, which is under expectation, but the traffic can be ingressed by load balancer and passed to the NodePort of GCE nodes, and finally routed to our application.

However, use ig (instance group) as ingress backend is not really efficient, I had tried to use NEG (Network Endpoint Group) mode for load balancer backend but without luck. The main reason is that NEG is only supported in VPC-native cluster, which seems to be only supported in GKE (?). If someone knows how to use NEG for self-managed cluster, welcome to leave your comments.

References

Introduction to ingress-gce controller

如果想要在 Google Cloud Platform 的 GKE 上使用 Ingress 功能，最簡單的方式就是用 GKE 本身預設提供的 ingress-gce controller。不過不知道大家在設定 ingress yaml…

yushuanhsieh.github.io

解决 Golang 升级到 1.18+ 版本后在容器中构建时出现 error obtaining VCS status: exit status 128 的问题

Intro Golang 是个特别神奇的语言，他的语法不是很复杂，而且编译出来的程序是个 Binary，不需要安装什么额外的 runtime 就可以跑，如果要写一个 Hello World 之类的小程序

nova.moe

How To Setup Ingress On GKE Using GKE Ingress Controller

This tutorial will guide you to setup Ingress on GKE using a GKE ingress controller with practical deployment examples…

devopscube.com