10 Tips when Building New Kubernetes Cluster on GCP, Part 2

Published in

Sohoffice

7 min readMay 15, 2019

This article is more relevant by 2019, some of the tips are no longer valid today.

The second part of the tips focus more on problem solving. I had run into all of them, they all took me some time to figure out. Hopefully you can stand on top of my experiences and provision faster.

6. Be careful when deleting service

When you’re planning your Kubernetes elements, you may think it’s a good idea to put deployments and related services together. However, I’ll advise you to separate them. It’s not about creating and updating, but more about deletion.

In GKE, the most time consuming things are those related to Load Balancer. Configuring Ingress can be even more time consuming than provision a new cluster. I guess the network inside GCP is too complex to require that amount of time to configure the load balancer. Since Ingress are connecting to your services, not your pods. If you delete your services, you’re taking the risk of confusing Ingress. If you split the services and deployments config file, you can very safely delete your applications (deployments) without touching the services. Therefore leaving Ingress intact. The above may also apply to service updates involving changed port, but deletion of service is definitely one thing to avoid.

If you run into load balancer (Ingress) problem. I’d suggest you to visit the console page, go into your load balancer. Look very carefully on the backends, ex: k8s-be-30130-abcdefg12345678. The trick is to match the port with your service listed viakubectl get services.

If you find ingress are referring to backend services that do not exist. You’ll have to rebuild the ingress. (and wait another 20 minutes, sorry about it.) I tried monkey patching, but it doesn’t seem to work.

Wondered why this is tip #6 ?

10 Tips when Building New Kubernetes Cluster on GCP, Part 1

Tips that help us setting up and running services on Kubernetes, GCP.

medium.com

7. You can Use StatefulSet with assigned volume

We all read the document on using StatefulSet, StatefulSet’s extremely suitable when dealing with service that require persistence. Services like database, redis … sort of things. The official document advises that we should use volumeClaimTemplate. But there’re times, specify the volume is easier, especially for small instances.

What StatefulSet + volumeClaimTemplate does is:

When you creates an application called app. Scale to 3 replicas. 3 instances are created using the container spec in the name app-0, app-1, app-2. VolumeClaims will be created base on the vc template and attached to the 3 instances. If you stop the application and start it again, StatefulSet will remember the correct binding between app-0 and vc-0, app-1 and vc-1 … So everything are back in order. Read the document on Kubernetes site for more details.

However, you should take the following points into consideration before you start using volumeClaimTemplate.

Will your service scale to multiple instances ? Do you really need K8s to remember the bindings for you ?
When you delete the application, the bindings are permanently lost. I can’t find a way to bind the pvc and pv back to the newly created application instance. (Any advises ?)
You actually can specify the gce disk directly.

For example, I don’t intend to scale my solr server in the foreseeable future, so I chose to manage the binding on my own. It’s easier to manage, and you’re free to delete and rebuild the application at any time.

Create the app.yml as the below:

containers:
  - name: foo
    volumeMounts:
      - name: foo-vol
        mountPath: /data
volumes:
  - name: foo-vol
    gcePersistentDisk:
      pdName: foo-disk
      fsType: ext4

Before provisioning the app, run the command to create the disk first.

gcloud compute disks create --size=1GB foo-disk

Think carefully before you choose between the 2 strategies. General rule of thumb:

If you know your application will not have multiple instances, you can specify the volume. Otherwise, use volumeClaimTemplate and try not to delete the application.

If you use volumeClaimTemplate, I think it is wise to write down the Deployment → PVC → PV bindings, in case you ever needed it some day.

Why StatefulSet over Deployment

StatefulSet without volumeClaimTemplate looks quite similar to a Deployment. Why should we use it in favor of Deployment ?

It actually has a very important advantage over using Deployment. (using the default update strategy)

StatefulSet will terminate the old application before starting the new instance.

Being single instance, down time is inevitable when updating. But the nature of StatefulSet reduces the downtime to minimum. Using deployment, on the other hand, will get yourself trapped. For it will launch the new instance before removing the old one. Think about it, if your app is running on node 1, and k8s is now starting new instance on node 2. Your disk will have a hard time moving about.

Again, this tip is only useful when you do not plan to scale your application beyond one instance.

8. Be careful of label and selector

Labels can be useful when managing the Kubernetes elements, but it can also be very dangerous if we don’t use it correctly.

Labels are arbitrary key value pair, where key and value do not have any special meaning to Kubernetes engine.

Taken the below labels for example:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: prod-redis
  labels:
    environment: prod
    app: redis

I was once convinced that app is a special label to specifically point to this application. After a few round of refactor, I lost track of the selector…

apiVersion: v1
kind: Service
metadata:
  name: prod-redis-service
  labels:
    app: redis

My application wasn’t working properly. The redis pubsub sometimes works, sometimes not. That was a terrible period of time, until I started to run monitor on my redis and realized not all commands arrived.

I guess you know what happened, my service selector wasn’t specific enough.

# expected
prod app → prod redis service → prod redis pod# actual
prod app → prod redis service → prod redis pod
                              \ staging redis pod

The usual NodePort type of service, also behaves like simple load balancer which works with multiple endpoints. Use kubectl get endpoints to view the list. And you will realize the relationship between service, endpoints and pods.

All in all, don’t just copy and paste the labels. Understand they are nothing more than what they are, labels. And use them wisely. The tip may sound very stupid, but it actually cost me a few days. I hope you never will run into this, but in case you do, remember to run kubectl get endpoints.

9. Only use error logging level when you should

The reason is simple, GCP has a very powerful logging tool: Stackdriver. Stackdriver has an error reporting section which includes only errors. You can be notified when Stackdriver picks up new error. Or you can use Cloud Console to investigate them. I don’t know about you, but I certainly don’t want to be bothered by trivial exceptions.

I’d suggest you review your exception loggings, use warning level if the exception do not require attention immediately. Use error level for those are more time sensitive.

As a side note, I’ll use EmptyDir for my logging directories. The logs can be found on Stackdriver any way. For example:

apiVersion: apps/v1
kind: Deployment
spec:
  template:
    spec:
      containers:
        volumeMounts:
          - name: log-vol
            mountPath: /data/logs
      volumes:
        - name: log-vol
          emptyDir: {}

10. Use GCR for private image

This tip is not for Kubernetes user, but for anyone who is building private docker images. The docker hub provides one free repo, but anything more than that you’ll have to pay at least $7/month. As a result, I’ve struggled setting up a private docker registry. Worked on generate the certificates, built up trust network via VPN sort of things. Which in total had cost me at least a day. And I had to clean up the storage on regular basis.

These can all be avoided by using Google Cloud Registry. Well … or AWS, Azure :) Just don’t build it yourself.

Judging from my Billing > Transaction so far, I believe using the GCR alone would have cost me far less than $1 a month. Totally affordable.

I do have a few other tips that I wanted to list, such as session affinity and cronjob … But Kubernetes is such a big topic, you can never discuss everything in a few days. I just hope you will like Kubernetes and the articles as much as I did.

Contact Me

LinkedIn, Github or Facebook

Did you learn something new? If so please:

↓ clap 👏 button below️ so more people can see this.