The haters said it couldn’t be done…hi haters!
This post, along with being perhaps the nerdiest thing I’ve ever done, is meant to pass on the domain knowledge I have acquired while trying to achieve what I thought was a very simple task: use Google’s Cloud Deployment Manager to create and manage a Regional GKE Cluster. Now, you may be asking “Why a Regional GKE Cluster as opposed to my trusty zonal GKE Cluster? Well, I won’t really get into that, because Google already did. Suffice to say, putting all my GKE eggs in one zonal basket doesn’t work for my use case.
So given the benefits of Regional Clusters, and how cool Deployment Manager seems, I was more than a bit disappointed to find that the only example on the internet that I could find…comes not from Google’s example repo or documentation, but from a humble comment on Stack Overflow. Shoutout to user shamma for guiding the way here. Allow me to piggy back on this and point you the way. For those more well versed in perusing Google’s API documentation, this helpful link, probably isn’t news to you. It was a hard to find God send for me in this endeavor.
Time for some code:
So this is a jinja file, I know I know, python is the preferred way but I don’t like it. Sorry to offend. The most important parts of this are line 2, which gives the correct type, and line 5 which is the preferred way to specify the project and region. A peak at my gke_cluster.yaml would probably help explain some things.
One of the nice things about using Jinja is that I can re-use this across multiple projects, but I haven’t gotten quite there yet.
Now, you can use the default pool and get everything you want out of cluster.jinja, if that’s your thing, but I like specifying my node pools explicitly. Here’s what that looks like.
Okay, line 23 literally took me a while to figure out, and it’s probably a well known fact that Deployment Manager creates resources asynchronously. The GKE cluster API is synchronous. This lead to a lot of head banging, that ultimately I discovered in a Knative git repo somewhere. Apologies for not remembering and citing that source, but kudos to whoever already knows this. Another big gotcha, for me at least, is how Oauth scopes and service accounts work within GKE. If you leave out lines 20–22 as I initially did, you will spend a good chunk of time wondering why you can’t pull from GCR or do any of the things you’ve permissioned your least privileged service account to do. Those Oauth Scopes are the answer why. When you omit those lines, you’ll only get the monitoring and logging oauth scopes, when Google’s documentation indicates that using the cloud-platform oauth scope and a custom service account are actually the best way to do this.
Hopefully this helps someone else out there. If you have questions, or better insight into how this works, feel free to drop a comment.