How Kubernetes Load Balancer Services Are Implemented in GKE (And How To Delete Them)
As of Dec 2017, for whatever reasons which could be on my side, I find dangling TCP Load Balancers after deleting a GKE cluster. That is, the Kubernetes cluster is long gone, but these LBs are just there, without a valid target to forward traffic to. I wanted to get rid of them, so I had to learn how to properly find/query them. These are my notes on this, in hopes that it helps somebody in the future.
Kubernetes Load Balancer services with external IP addresses are implemented simply as HTTP/TCP load balancers into the GKE cluster node instances.
In order to cleanup, we need to find out the LBs that we should check for. I did not know this until I dove into this, but apparently there isn’t a
gcloud compute loadbalancers command. Instead what you do is to look for
$ gcloud compute forwarding-rules list --format json
Note that I used
--format json because I needed to parse an inspect the entries in a script. You don’t have to do this if you are just doing it by hand.
This will give you a list of forwarding rules, along with some metadata about them. One of the things you should look for is the
description field, which will contain the string
"kubernetes.io/service-name/$your-service-name" . This field will only be populated if the LB was created for a GKE Service. Since we’re currently only interested in this particular resource, we will grep for it, and filter out the forwarding rules that do not have this string.
Next, you need to find the
target-pool to check for. The value is listed under the
target field, but you should also find the region that this target pool is created under. It will be encoded in the field as a URL:
Use your favorite tool to extract this, sed, grep, perl, whatever.
Next, find out details about this target pool by issuing this command:
$ gcloud compute target-pools describe $target-pool --region $region --format json
You need to provide the
--region parameter, or it will ask you for it, and that’s annoying. This is why I instructed you to extract it from the previous URL.
In the target pool’s
instances field, GCE VM names that are associated with this target pool will be listed. Note again, these will be URLs. Use your favorite tool to extract out the
name of these instances.
Now check for the instance’s existence by issuing the following command:
$ gcloud compute instances describe $name --zone $zone --format json
If this command fails for all names in the
instances field, you most likely have a dangling TCP LB. You can delete this by issuing the following TWO commands:
$ gcloud compute forwarding-rules delete $forwarding-rule-name --region $region
$ gcloud compute target-pools delete $target-pool --region $region
The order of these operations matter, as GCP generally does not allow you to delete resources that are being used by another live resource.
And now your TCP LB should be gone.
Finally, this is the script I wrote to automate this https://gist.github.com/lestrrat/2407c11947fbe2cd3c8770c959aa06d3
I should also mention that the same thing (dangling LBs) happens to my HTTP(s) LBs created by Kubernetes Ingress resources. I just didn’t write the feature to delete these in my script, because I deleted them manually by hand right before deciding to write the script :)
The best thing that could happen is for this issue to go away, but for now, at least I don’t have to investigate how to do this again, so I’m happy.