Using Ansible’s GCP Library to Provision a Kubernetes Cluster in Google Cloud

Published in

Dzero Labs

9 min readMay 9, 2021

Double-take

A few years ago during a job interview, someone asked me the difference between Ansible and Terraform. I confidently told them that Ansible is used for configuration management, and Terraform is used for infrastructure provisioning. For example, you could use Terraform to create a VM instance in the cloud, and use Ansible to install apps and configure the VM.

Imagine my surprise when, this week, I discovered that Ansible has a library of modules for provisioning resources for various Cloud providers!

Maybe this isn’t new to you, but for me this was a HUGE SHOCK! I was so shocked that at first I didn’t believe what I was reading. So I decided to do some some digging. It turns out that Ansible has been at this since at least 2018 (Ansible 2.6), when it partnered with Google to create Google Cloud Platform modules. Well then…

My shock was then replaced by a burning desire to try this thing out for myself. So I decided to create a little test project to use Ansible to create a Kubernetes cluster in Google Cloud.

Ready? Let’s do this!

Creating a GKE Cluster Using Ansible

Pre-Requisites

This tutorial assumes that:

You have an existing Google Cloud project
You have an understanding of Docker, and have it installed on your local machine
You’ve created a Service Account in Google Cloud
You’ve created a Google Kubernetes Engine (GKE) cluster before
You’ve used Ansible before, and are familiar with Ansible roles

The last time I used Ansible was sometime in early 2020, so it took me a bit of time to re-acclimate myself with it. That said, I was really surprised by how smoothly things went overall!

In order to make this easier, I’ve gone ahead and created a self-contained, containerized environment for running this tutorial.

1- Clone the tutorial repo

Let’s begin by cloning the tutorial repo:

git clone git@github.com:d0-labs/ansible-gke.git

2- Set up your environonment variables

Now that we’ve cloned the project, let’s open up setup.sh. Replace the values in <…> with your own values.

<gcp_project_name>: This is the name of our Google Cloud project. If you’re wondering what your project name is, use this command:

gcloud projects list

Sample output:

PROJECT_ID        NAME              PROJECT_NUMBER
aardvark-project  aardvark-project  112233445566

Use the value returned in the NAME column.

<service_account_name>: The name of the service account for your Google Cloud project.

<service_account_private_key_json>: This is the fully-qualified name of the JSON service account private key stored on your local machine. For example, /home/myuser/my-sa.json, if your file is located in the /home/myuser folder. Or my-sa.json if your file is located in your current working directory.

Note: This JSON private key is generated upon creation of the Service Account, to be sure to store it somewhere safe (and not in version control). Per Google’s docs on Service Account Keys, “After you download the key file, you cannot download it again.”

Next, run the following script:

./setup.sh

This script uses the Linux envsubst command to replace the values you set above with your own GCP project’s values in ansible/playbook.template.yml and creates ansible/playbook.yml. Similarly, it replaces values in docker/startup.template.sh, and creates docker/startup.sh.

3- Build the Dockerfile

The Dockerfile below has everything you need to run the example tutorial. It installs:

The gcloud CLI
The kubectl CLI
Ansible
The Ansible GCP modules

Let’s build the Dockerfile:

docker build -t docker-ansible:1.0.0 docker

Remember — the Dockerfile is located in the docker folder of our project, which is why the context is set to docker when building.

4- Take a look at the playbook

The playbook.yml file (located in the ansible directory) is our Ansible playbook. It doesn’t do much, other than set up some Google Cloud variables, and call a role called gke-test. The role is where the real magic happens.

Nothing to do here specifically. I just wanted you to look at the file.

5- Take a look at gke-test/tasks/main.yml

The main.yml file located in the ansible/gke-test/tasks directory is where the cluster is created and configured.

This is the task file for our gke-test role. It does three things:

Creates and configures our Kubernetes cluster (google.cloud.gcp_container_cluster module from the google.cloud collection that we installed in Step 3)
Creates and configures a node pool for the Kubernetes cluster (google.cloud.gcp_container_node_pool module from the google.cloud collection that we installed in Step 3)
Adds the newly-created cluster to our local kubeconfig file

The cluster is created in the first task, and then the cluster info is passed to the node pool creation task to create the node pool in the cluster.

Note: Both tasks are required for creating a Kubernetes cluster in Google Cloud. If you exclude the node pool task, once your cluster is created, it will show up as having 0 nodes, which means that it’s pretty much useless.

If you’ve used Terraform or Pulumi before, the parameters in the above tasks may look familiar. These parameters are used to configure your cluster. For example:

initial_cluster_version (cluster task) sets the version of the GKE cluster to 1.19.x
initial_node_count (cluster task) is the number of nodes to create when creating the cluster
config.machine_type (node pool task) sets the type of VM to use for our cluster nodes

For more info on what these parameters mean, check out the google.cloud.gcp_container_cluster module documentation and the google.cloud.gcp_container_node_pool module documentation.

Note also the state parameter on lines 30 and 60. If state: present, then Ansible will attempt to create the cluster and node pool. If state: absent, then Ansible deletes the cluster and node pool when you run the playbook. We’re parametrizing it in this task. More on that below.

6- Create the cluster

Now we’re ready to create our cluster! To create it, let’s spin up our container instance:

docker run -it --rm \
  -v $(pwd)/ansible:/workdir/ansible \
  docker-ansible:1.0.0 /bin/bash

You should see a prompt that looks something like this:

root@57236f71a22c:/workdir#

At the prompt, run the command below from the root folder of your project:

./startup.sh && ansible-playbook -vv --extra-vars cluster_state=present ansible/playbook.yml

This command will set your project and authentication (via service account) in Google cloud, and then will run the Ansible playbook.

Note that we’re setting the command-line variable cluster_state to present. This gets replaced in lines 30 and 60 of the file in Step 6, telling Ansible to create the cluster and the node pool.

This may take a few minutes. Mine took about 10 minutes using the setup above.

You’ll notice that it’ll appear to be stuck on the “Create a GKE cluster” task for a while. Don’t panic! Ansible is creating your GKE cluster. For a little peace of mind, you can check things out on your Google Cloud Console by going to Kubernetes Engine > Clusters from the left-hand menu:

Take note of the lovely spinning circle (outlined by the red box in the image above). This is Google telling you that your cluster is being created!

Similarly, once the cluster is created, Ansible will appear to be stuck on the node pool task. Again, you can go to the Google Cloud console to see what’s going on, by clicking on your cluster and then selecting the Nodes tab:

Once the cluster and node pools are created, you should see output that looks something like this:

Note: If you use -vvv instead of -vv, you get a more verbose output, and the cluster creation summary block (the stuff in yellow in the screen shot above) is formatted as JSON.

8- Connect to the cluster

If all goes well, you should now have a brand-spaking-new Kubernetes cluster. Let’s do a quick spot check. First, let’s make sure that the cluster is in our kubeconfig:

kubectl config get-contexts

We should get an output that looks something like this:

Yup. There’s our cluster!

Let’s also run a quick command in our cluster to check the namespaces:

kubectl get nodes

Your output should look something like this:

And let’s peek into our namespaces:

kubectl get ns

There you go! We’ve got ourselvs a GKE cluster!

9- Delete the cluster

To delete the cluster, you simply re-run the playbook, setting the cluster_state extra-var to absent:

./startup.sh && ansible-playbook -vv --extra-vars cluster_state=absent ansible/playbook.yml

This gets replaced in lines 30 and 60 of the file in Step 6, telling Ansible to delete the cluster.

Again, it’ll take a few minutes to run the task. When all is said and done, you’ll get an output that looks something like this:

Sample Ansible output after deleting a GKE cluster.

Thoughts on Ansible for Cloud Provisioning

Honestly, I am shocked by how easy it was to create a Kubernetes cluster in Google Cloud using Ansible’s GCP library. When I embarked on this little experiment, I fully-expected this to take a chunk of my afternoon.

Personally speaking, there were only two time-consuming things:

1- Remembering how to use Ansible

I hadn’t used Ansible in a while, so I had to remember a bunch of syntax, and I had to remind myself how roles worked.

2- Sparse documentation

The documentation for the Ansible cloud libraries is rather sparse and somewhat vague, so I had to do a big of digging around to make sure I was using the libraries correctly. Honestly, Ansible’s own docs turned out to be pretty decent, but I think I can say this because I’ve messed around with other Cloud infrastructure provisioning tools like Terraform and Pulumi, so I wasn’t going in completely blind.

When I tried to search for solutions on creating Cloud infrastructure using Ansible outside of Ansible’s own docs, I found a whole lotta nothing. Most posts out there use Ansible to provision Cloud infrastructure the hard way, resorting to making calls to the target Cloud’s Provider’s CLI. The modules are waaaay nicer, and are fully-declarative.

Other thoughts

After creating the cluster initially, I decided to try a little experiment. I changed the number of initial cluster nodes from 3 to 4, and re-ran the playbook with --extra-vars cluster_state=present. If I had done this in Terraform or Pulumi, those tools would have seen this as a modification of state, and would’ve proceeded to delete and recreate the cluster. This didn’t happen with Ansible.

Instead, when I re-ran the playbook after making the parameter change, Ansible told me that there were no changes to the cluster.

For you Terraform and Pulumi folks, this may be a glaring issue. Not for me. I’m actually glad that it does that. Infrastructure creation/modification should not be taken for granted. We must always be vigilant of our infrastructure. If we make changes to it, we should explicitly destroy and recreate it. We must treat it as ephemeral, so that we have full confidence in the repeatability of infrastructure creation.

Conclusion

There are two big lessons from this Ansible experiment:

Technology is always evolving, so it’s always good to revisit old techs that you’ve touched years or months earlier, to see where it’s at — it may surprise you!
Ansible is actually a good, easy-to-use tool for provisioning Cloud infrastructure. Plus, it’s declarative, which gets bonus points in my book!

And now, I shall reward you with a Susie the rat being cuddled by my other half!

Susie the rat likes cuddles and declarative infrastructure provisioning.

Peace, love, and code.