Apply ingress-gce controller for self-managed K8S on hybrid-cloud

Frank Chung
DeepQ Research Engineering Blog
7 min readNov 17, 2023

Introduction

In general case we will run k8s service on GKE (Google Kubernetes Engine) and deploy an Ingress rule to automatically create the load balancer resources. However, if the k8s cluster is self-managed to support multi-cloud or hybrid cloud-onprem scenario, we can consider to use ingress-nginx controller but it is obviously not scalable.

Fortunately, Google opens the implementation of ingress-gce controller that implements ingress controller for GKE, we can leverage it to do the load balancing on nodes located in GCE (Google Compute Engine). However, the documentation of ingress-gce is suck, and the repository is poorly maintained. Today this tutorial is the first and only one on internet to apply ingress-gce step-by-step on the self-managed k8s cluster which contains nodes located on GCE.

Architecture

Architecture for Ingress Controller

Let’s say there is a self-managed cluster, control-plane node is hosted by a GCE VM, and join a worker node from another network (Use Cloud VPN tunnel to connect Google VPC and local LAN). Since Ingress controller on GCP is only natively supported by GKE, we need to manually deploy the ingress-gce controller to enable the Cloud Load Balancer.

Load Balancer Flow

The target is to establish the load balancer, whose backend service is the instance group, which contains the GCE nodes in the cluster (On-prem nodes cannot be routed by GCP load balancer). Ingress traffic is received by NodePort service, and routed to corresponding pods in the cluster.

Prerequisite

Nodes OS: Ubuntu 20.04 LTS

K8s Versin: v1.26.0

Ingress-gce Version: v1.25.1

Build OS: Mac OS 10.14.6

Build gcloud SDK: 418.0.0

Setup Procedure

1. Check GLBC Version

Check below table to know which version of GLBC (Google Load Balancer Controller) you need to build. Since data is out-dated, so I use the latest of GLBC version v1.25.1 in 2023/11.

   * 1.12.7-gke.16+ -> v1.5.2
* 1.13.7-gke.5+ -> v1.6.0
* 1.14.10-gke.31+ -> 1.6.2
* 1.14.10-gke.42+ -> 1.6.4
* 1.15.4-gke.21+ -> 1.7.2
* 1.15.9-gke.22+ -> 1.7.3
* 1.15.11-gke.15+ -> 1.7.4
* 1.15.12-gke.3+ -> 1.7.5
* 1.16.8-gke.3+ -> 1.9.1
* 1.16.8-gke.12+ -> 1.9.2
* 1.16.9-gke.2+ -> 1.9.3
* 1.16.10-gke.6+ -> 1.9.7
* 1.17.6-gke.11+ -> 1.9.7
* 1.18.4-gke.1201+ -> 1.9.7
* 1.16.13-gke.400+ -> 1.9.8
* 1.17.9-gke.600+ -> 1.9.8
* 1.18.6-gke.500+ -> 1.9.8
* 1.18.6-gke.4800+ -> 1.9.9
* 1.18.10-gke.1500+ -> 1.10.8
* 1.18.10-gke.2300+ -> 1.10.9
* 1.18.12-gke.1200+ -> 1.10.13
* 1.18.18-gke.1200+ -> 1.10.15
* 1.18.19-gke.1400+ -> 1.11.1
* 1.18.20-gke.5100+ -> 1.11.5
* 1.19.14-gke.1900 -> 1.11.5
* 1.20.10-gke.301 -> 1.11.5
* 1.21.3-gke.210 -> 1.13.4

2. Build the GLBC

Though there are pre-built docker images such like registry.k8s.io/ingress-gce-glbc-amd64:v1.8.0 , but almost latest versions are not maintained. So still suggest build the image by ourselves:

# Download the desired version of ingress-gce
VERSION_TAG=v1.25.1
git clone https://github.com/kubernetes/ingress-gce.git
cd ingress-gce
git checkout ${VERSION_TAG} -b ${VERSION_TAG}

# Manually modify build script to reduce VCS error.
vim build/build.sh
+export GOFLAGS=-buildvcs=false # add this line
export CGO_ENABLED=0
export GOARCH="${ARCH}"
export GOOS="${OS}"

# Manually modify makefile to reduce gcloud CLI error.
vim build/rules.mk
# replace this line to below lines.
-@gcloud docker -- push $$(head -n 1 $<) $(VERBOSE_OUTPUT)
+gcloud auth configure-docker
+docker push $$(head -n 1 $<) $(VERBOSE_OUTPUT)

# make and push the built docker image to your desired docker registry.
REGISTRY="gcr.io/${PROJECT_ID}" make only-push-glbc

# now keep the tag of your built docker image
TAG="gcr.io/your-project-id/ingress-gce-glbc-amd64:v1.25.1-dirty"

3. Edit the gce.conf

Before deploying the ingress-gce controller, we need to fill in the configuration to tell the controller where to create the load balancer.

cd ingress-gce/docs/deploy/gke/non-gcp

# edit the config file.
vim gce.conf
token-url = nil
# remove this line, it is GA now.
-api-endpoint = https://www.googleapis.com/compute/alpha/
# The project your cluster registered to
project-id = [PROJECT]
# The network your cluster will be peering with
network-name = [NETWORK]
# Closest GCP zone to your cluster
local-zone = [ZONE]

4. Edit the k8s yamls

Since the document is out-dated, we need to manual modify rbac.yaml and glbc.yaml for the controller to normally run, following is the modification:

cd ingress-gce/docs/deploy/gke/non-gcp

# edit the rbac to add more definations
vim rbac.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: system:controller:glbc
rules:
# add below rules to this ClusterRole
- apiGroups: [""]
resources: ["services/status"]
verbs: ["patch"]
- apiGroups: ["extensions", "networking.k8s.io"]
resources: ["ingresses/status"]
verbs: ["patch"]
- apiGroups: ["networking.gke.io"]
resources: ["frontendconfigs"]
verbs: ["get", "list", "watch", "update", "create", "patch"]
- apiGroups: ["networking.gke.io"]
resources: ["servicenetworkendpointgroups","gcpingressparams"]
verbs: ["get", "list", "watch", "update", "create", "patch", "delete"]
- apiGroups: ["networking.k8s.io"]
resources: ["ingressclasses"]
verbs: ["get", "list", "watch", "update", "create", "patch"]
- apiGroups: ["coordination.k8s.io"]
resources: ["leases"]
verbs: ["*"]
- apiGroups: ["discovery.k8s.io"]
resources: ["endpointslices"]
verbs: ["get", "list", "watch"]

# edit the glbc.yaml
containers:
- image: registry.k8s.io/ingress-gce-glbc-amd64:v1.8.0
# change the GLBC image you built in step2.
- image: gcr.io/your-project-id/ingress-gce-glbc-amd64:v1.25.1-dirty

5. Create GCP service account

Now follow the official tutorial to create GCP service account and create its secret for k8s.

# create a service account
gcloud iam service-accounts create glbc-service-account \
--display-name "Service Account for GLBC" --project $PROJECT

# binding compute.admin role to the service account
gcloud projects add-iam-policy-binding $PROJECT \
--member serviceAccount:glbc-service-account@${PROJECT}.iam.gserviceaccount.com \
--role roles/compute.admin

# Create key for glbc-service-account.
gcloud iam service-accounts keys create key.json --iam-account \
glbc-service-account@${PROJECT}.iam.gserviceaccount.com

# Store the key as a secret in k8s. The secret will be mounted as a volume in
# glbc.yaml.
kubectl create secret generic glbc-gcp-key --from-file=key.json -n kube-system

rm key.json

6. Label the nodes

Since GLBC will try to find each node in cluster and assign to ig (instance group) or NEG (Network Endpoint Group) backend, we need to manually label the GCE node and non-GCE node:

# remove the exclude label for gce nodes (control-plane is excluded by default)
kubectl label nodes ${gce-node} node.kubernetes.io/exclude-from-external-load-balancers-
# add the exclude label for non-gce nodes.
kubectl label nodes ${non-gce-node} node.kubernetes.io/exclude-from-external-load-balancers=true
# add zone labels for all nodes (including on-prem nodes)
kubectl label nodes ${non-gce-node} topology.kubernetes.io/zone=${zone}

7. Deploy the controller to k8s

# Grant permission to current GCP user to create new k8s ClusterRoles.
kubectl create clusterrolebinding one-binding-to-rule-them-all \
--clusterrole=cluster-admin \
--user=$(gcloud config list --project $PROJECT --format 'value(core.account)' 2>/dev/null)

kubectl create -f rbac.yaml

# put the gce.conf to configmap
kubectl create configmap gce-config --from-file=gce.conf -n kube-system

# apply the k8s yamls.
kubectl create -f default-http-backend.yaml
kubectl create -f glbc.yaml

8. Check the status of deployments

# check controller and default-backend are running.
kubectl get pod -n kube-system
NAME READY STATUS RESTARTS AGE
l7-default-backend-f5cdf9d5b-rhmdw 1/1 Running 0 3d19h
l7-lb-controller-8959cc478-dbmx9 1/1 Running 0 39h

# check the default backend service is serving 80 with NodePort
kubectl get svc -n kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default-http-backend NodePort 10.98.201.232 <none> 80:12821/TCP 3d19h

9. Reserve the static IP

Ingress rule needs to define the name of static IP, so we need to create it before applying the Ingress rule.

gcloud compute addresses create lb-ip --global
gcloud compute addresses describe lb-ip --global

10. Deploy our application and Ingress rules

Now we can try to deploy our web application to test the ingress.

# nginx.yaml
apiVersion: v1
kind: Pod
metadata:
name: nginx
labels:
app.kubernetes.io/name: proxy
spec:
containers:
- name: nginx
image: nginx:stable
ports:
- containerPort: 80
name: http-web-svc
---
apiVersion: v1
kind: Service
metadata:
name: nginx-service
spec:
type: NodePort
selector:
app.kubernetes.io/name: proxy
ports:
- protocol: TCP
port: 80
targetPort: http-web-svc
# ingress.yaml
apiVersion: networking.k8s.io/v1
kind: IngressClass
metadata:
annotations:
ingressclass.kubernetes.io/is-default-class: "true"
name: gce
spec:
controller: k8s.io/ingress-gce
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: myingress
namespace: default
annotations:
kubernetes.io/ingress.class: "gce"
kubernetes.io/ingress.global-static-ip-name: "lb-ip"
spec:
defaultBackend:
service:
name: nginx-service
port:
number: 80

Now we can apply above yamls:

kubectl apply -f nginx.yaml
kubectl apply -f ingress.yaml

11. Check the status of Ingress

kubectl get ing
NAME CLASS HOSTS ADDRESS PORTS AGE
myingress None 80, 443 33h

kubectl describe ing myingress
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Sync 2m58s (x207 over 33h) loadbalancer-controller Scheduled for sync

# if something abnormal, please check the log of controller
kubectl logs l7-lb-controller-8959cc478-dbmx9 -n kube-system

# try to curl our nginx application
curl http://${static_ip}:80/

Results and Evaluation

After the step-by-step setup, the HTTPs load balancer will be created after 3~4 mins the Ingress rule is applied, and we can see the network topology of it like:

Network Topology of Load Balancer

We can see the on-prem node will not be routed by load balancer, which is under expectation, but the traffic can be ingressed by load balancer and passed to the NodePort of GCE nodes, and finally routed to our application.

However, use ig (instance group) as ingress backend is not really efficient, I had tried to use NEG (Network Endpoint Group) mode for load balancer backend but without luck. The main reason is that NEG is only supported in VPC-native cluster, which seems to be only supported in GKE (?). If someone knows how to use NEG for self-managed cluster, welcome to leave your comments.

References

--

--