How to use GCP LoadBalancer with WebSockets on Kubernetes using Services, Ingresses, and Backend Config

Published in

johnjjung

4 min readOct 11, 2019

So, we’ve recently run into some interesting issues where we’ve been using GCP’s Loadbalancers on Kubernetes and serving an API endpoint that uses websockets.

The Problem: GCP LoadBalancers are not by default configured to handle websockets and optimized for http calls, because by default the load balancers have 30 second timeouts in place that causes connections to close.

You’ll see something like this happen maybe every few seconds depending on your client-server websocket ping settings. — You’ll see something like this happen maybe every few seconds depending on your client-server settings

This will walk you through how to setup a load balancer, ingress, and configure it for you so that you stop getting timeout outs when web-sockets ping. We’ll assume you already have your pod with your web-socket deployed and your client trying to connect.

First if you don’t already have a service exposed with a load balancer, let’s do it.

apiVersion: v1
kind: Service
metadata:
  name: feathers
  labels:
    app: feathers
spec:
  ports:
  - port: 80
    targetPort: 80
  selector:
    app: feathers
  type: LoadBalancer

What you should expect to see when you do kubectl get svc is something like this:

feathers              LoadBalancer   10.24.0.9      w.x.y.z    80:30xxx/TCP     19m

w.x.y.z being a real external ip address that you can hit at the node port 30xxx. Try a curl call.

Sometimes the LoadBalancer says <pending> while it’s getting you an external IP and setting up your load balancer.

This will let you hit your service but you probably want to serve this via https so use an ingress for this.

Make sure you have an SSL Secret

apiVersion: v1
data:
  tls.crt: <base64 of your cert + cert chain>
  tls.key: <base64 of your key file>
kind: Secret
metadata:
  name: your-cool-ssl-secret
  namespace: default
type: kubernetes.io/tls

This will create a secret file for your SSL Certs for you to reference in your Ingress.

---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: feathers
spec:
  tls:
  - secretName: your-cool-ssl-secret
  rules:
  - host: your-domain-name.cool.ai
    http: 
      paths:
      - backend:
          serviceName: feathers
          servicePort: 80

Important points:

serviceName should match the service name of your load balancer
port should match the port that you’re serving in your service
your service should return 200 response at the webroot of your service. This means if you curl http://feathers:80 from another debug pod then it should return a 200, otherwise your loadbalancer will return unhealthy. This is painful that you can’t easily edit the healthchecks, but that’s for another post.
same thing like the loadbalancer, your external IP will take a bit of time to appear. If it doesn’t in about 3 minutes check point 3.
x.x.x.x is your external IP address, this is where your DNS should be hitting now

feathers              your-domain-name.cool.ai          x.x.x.x   80, 443   19h

Now that you have an ingress exposed, and your DNS is pointing your-domain-name.cool.ai to x.x.x.x, check that it’s working.

dig domain-name.cool.ai

Make sure that you see your x.x.x.x this can take several minutes depending on your TTL settings.

The Tricky Part with WebSockets

Great you now have an https endpoint with a service exposed, but your websockets keep resetting.

It says that websockets are supported by default, but the GCP instructions are a little bit spread out.

The two step process is to:

Create a BackendConfig
Update your load balancer to be associated with that BackendConfig

So first thing you do is create a BackendConfig. This is where you increase your timeouts. By default, it’s something under 60 seconds for both values.

apiVersion: cloud.google.com/v1beta1
kind: BackendConfig
metadata:
  name: feathers-backendconfig
spec:
  timeoutSec: 1800
  connectionDraining:
    drainingTimeoutSec: 1800

The other question is what the heck are these timeouts for?

drainingTimeoutSec

Time, in seconds, to wait for connections to drain. Default is 0 seconds.

timeoutSec

For longer-lived connections to the backend service from the load balancer, configure a timeout setting longer than the 30-second default.

But for websockets, you don’t want it to be so short, otherwise it’ll start disconnecting and reconnecting every 30 seconds. Depending on your application, you should set it to something reasonable — usually a good mark is your average session length (hint: amplitude and mixpanel measure a session by default at 30 minute intervals. This means that if your user comes back within 30 minutes it’s the same session). So 30 minutes sounds pretty good, but again, it depends on your application.

Another Note: Your clients should have reliable timeouts and not something too high because your server timeouts are now high.

Now that you’ve created your backendconfig, we’ll need to apply this backend config to your service. To your original svc file, add the annotation:

---
apiVersion: v1
kind: Service
metadata:
  annotations:
   beta.cloud.google.com/backend-config: '{"ports": {"80":"feathers-backendconfig"}}'
  name: feathers
  labels:
    app: feathers
spec:
  ports:
  - port: 80
    targetPort: 80
  selector:
    app: feathers
  type: LoadBalancer

Note: feathers-backendconfig should match the backend config you just made and the port should also match the service port.

You can apply this using:

kubectl apply -f service.yml

Once you applied the backend config and update the service with the annotation, your websockets should be working!

How to use GCP LoadBalancer with WebSockets on Kubernetes using Services, Ingresses, and Backend Config

The Tricky Part with WebSockets

Written by John Jung