Orchestrating GCE Instances with Ansible

Here at Vimeo, I’ve been trying to find the sweet spot between designing immutable infrastructure and retaining the ability to identify single resources within a homogenous group. I’ve recently adopted Ansible into my toolkit for just that. This post is going to focus on using Ansible as an orchestration tool, leaving the configuration-management aspects for another day. You can quickly install Ansible via ‘pip’.

pip install ansible

Why Ansible?

I’m a big fan of immutable infrastructure and infrastructure as code. These relatively new practices have the potential to ease administration, increase autonomy, and decrease large configuration management code bases. These are all nice features of this paradigm, but I’ve found that dealing with systems under load or new systems development requires us to occasionally fall back on more traditional models of operation.

A good example of falling back to treating our servers as “pets” is when we need to quickly enumerate process metrics from a host within a scaling group. (While this system is in development, we might not have detailed monitoring, and may have to use the command line.) Or when we’re quickly pushing changes to the number of worker processes a master process spawns, and viewing the system load before pushing out to all hosts — canary testing, if you will.

So, let’s get started

I’ll focus on GCE, but the concepts will be the same with AWS. You’ll have to fill in the gaps between the clouds. In order for us to start using Ansible to orchestrate instances in GCE, we’ll need to use Ansible’s dynamic infrastructure feature. This allows us to use a specified script in order to query GCE. When Ansible queries GCE, it creates an in-memory database of instances within our GCE Project (or VPC if you’re in AWS). We can find this feature in this platform guide.

Now, those instructions might be a little confusing. They tell you to use several files that provide the same information multiple times. I’m gonna give you simplified instructions. You should have an “ansible” service account created in your GCE project, and have this account’s json credentials downloaded to your workstation. If you don’t know how to do that, check out this page.

Once you’ve downloaded the “ansible” service account’s json file to your workstation, we need to do the following:

Install apache-libcloud

Since the python package apache-libcloud is a prerequisite. I’d install this into your global system’s python distribution.

pip install apache-libcloud
pip list #Confirm apache-libcloud is present

Create directory structure

Create a directory structure for Ansible. You can pick any root directory you like (but for my examples, I’ll use ~/git/ansible). Within the Ansible directory, create a folder called “inventory”.

mkdir -p ~/git/ansible/inventory

Obtain necessary files from ansible repo

Now we need to clone the Ansible repository only to obtain two files: ansible/contrib/inventory/{gce.py,gce.ini}. We will copy the gce.py file into ~/git/ansible/inventory and copy the gce.ini file into ~/git/ansible.

git clone https://github.com/ansible/ansible
cp ansible/contrib/inventory/gce.py ~/git/ansible/inventory/
cp ansible/contrib/inventory/gce.ini ~/git/ansible/

Configure gce.ini

After copying the files into the appropriate place, we need to populate gce.ini with the correct information. Fill out the following information:

gce_service_account_email_address = # Service account email found in ansible json file
gce_service_account_pem_file_path = # Path to ansible service account json file
gce_project_id = # Your GCE project name

Export gce.ini environment variable

Now we need to set an environment variable that informs the gce.py script where its ini file is located. I put the following in my .bashrc\.zshrc

export GCE_INI_PATH=~/git/ansible/gce.ini

Confirm #! points to the right python distribution

One last thing to check is that the #! (hashbang) directive in gce.py is using the correct python interpreter. The interpreter must be in the distribution folder in which your pip command installed apache-libcloud. (If not it, will fail.) The easiest way to remedy this is to run which:

❯ which python
  /usr/bin/python

With this information, update the first line of ~/git/ansible/inventory/gce.py to ‘#!/usr/bin/python’ (for my example).

Make sure everything works

So, after all this is taken care of, we can start to use Ansible to look at our GCE inventory. In order for us to test that the correct pieces are in the correct places, run the following command:

~/git/ansible/inventory/gce.py — list

You should see a large list of machines and information get dumped to your terminal. If you have any python package dependency issues, make sure you pip installed the complaining package, and make sure the interpreter used in the #! portion of the gce.py script is correct.

Hopefully at this point, everything’s working for you. Now Ansible’s able to query our GCE inventory. For those not familiar with Ansible, let’s explain what’s actually happening here.

What’s actually happening here

Ansible works with inventory files. An inventory file tells Ansible where to find the target machine that you’d like to perform some action on. In a typical use case, you’d be editing this inventory file yourself, adding hosts, and grouping them in intelligent ways. However, Ansible can also evaluate a script in order to form its inventory. That’s what we’re doing here. We’ll instruct Ansible to look at our inventory folder, in which will be gce.py. Ansible will execute gce.py and create an in-memory database of instances running within GCE.

Target a host, groups of hosts, and an instance group

The workflow that works best for me is: associating groups of servers with tags before having Ansible look at the inventory, pick the machines with our specified tag, and identify that these machines are targets. I’m going to create the following GCE components in order to demonstrate.

  • Instance-1 with tags [ example, one ]
  • Instance-2 with tags [ example, two ]

PS — For now the tag names are arbitrary, and here simply for instruction.

Let’s run the following command and notice the output:

❯ ansible -i ~/git/ansible/inventory tag_one -m ping
  instance-1 | SUCCESS => {
  “changed”: false,
  “ping”: “pong”
  }

Great! We were able to point Ansible to our inventory file, and specify the ‘<host pattern>’ argument as a tag within GCE. You can probably see where this is going. We can create arbitrary groups of machine targets by placing them under the same tags. Let’s see what happens when we use the ping module against the tag ‘example’

❯ ansible -i ~/git/ansible/inventory tag_example -m ping
instance-2 | SUCCESS => {
“changed”: false,
“ping”: “pong”
}
instance-1 | SUCCESS => {
“changed”: false,
“ping”: “pong”
}

We just performed an action on two machines within our GCE cluster, simply by referencing their tag. This really shines when we have instance-groups (autoscaling groups in AWS). In my GCE project, I currently have an instance group named player-sentry-prod-worker-processor-1–0. I can target every machine within this group with the following command:

❯ ansible -i ~/git/ansible/inventory player-sentry-prod-worker-processor-1–0 -m ping
player-sentry-prod-worker-processor-1–0–7364 | SUCCESS => {
“changed”: false,
“ping”: “pong”
}
player-sentry-prod-worker-processor-1–0-iku7 | SUCCESS => {
“changed”: false,
“ping”: “pong”
}
player-sentry-prod-worker-processor-1–0-w2l6 | SUCCESS => {
“changed”: false,
“ping”: “pong”
}
player-sentry-prod-worker-processor-1–0–8fty | SUCCESS => {
“changed”: false,
“ping”: “pong”
}
player-sentry-prod-worker-processor-1–0-j483 | SUCCESS => {
“changed”: false,
“ping”: “pong”
}

Run commands across multiple instances

Let’s run an arbitrary command across all these nodes at once.

❯ ansible -i ~/git/ansible/inventory player-sentry-prod-worker-processor-1–0 -a ‘date’
player-sentry-prod-worker-processor-1–0–7364 | SUCCESS | rc=0 >>
  Tue Dec 20 06:53:29 UTC 2016
player-sentry-prod-worker-processor-1–0-w2l6 | SUCCESS | rc=0 >>
  Tue Dec 20 06:53:29 UTC 2016
player-sentry-prod-worker-processor-1–0-iku7 | SUCCESS | rc=0 >>
  Tue Dec 20 06:53:29 UTC 2016
player-sentry-prod-worker-processor-1–0-j483 | SUCCESS | rc=0 >>
  Tue Dec 20 06:53:29 UTC 2016
player-sentry-prod-worker-processor-1–0–8fty | SUCCESS | rc=0 >>
  Tue Dec 20 06:53:29 UTC 2016

As you can see, this can be a pretty powerful orchestration tool with very little necessary infrastructure. No servers, no agents, just tooling around SSH and GCE to make our lives easier.

Create runbooks for groups of instances

Now, since Ansible is a full-fledged configuration management solution — we can start to create playbooks that can act as runbooks for our servers. An example use case is quickly pushing a new systemd configuration file to a set of machines based on tag (or instance group, if you’d like). It’d look something like this:

❯ ls ~/git/ansible/plays/sentry/sentry-web/
push_config.yml sentry-web.service
❯ cat ~/git/ansible/plays/sentry/sentry-web/push_config.yml
--
- hosts: tag_sentry-web
tasks:
  - name: Upload systemd service to host
  copy:
    src: sentry-web.service
    dest: /etc/systemd/system/
    owner: root
    group: root
    mode: 0644
  become: true
  - name: Restart systemd service
  systemd:
    state: restarted
    daemon_reload: yes
    name: sentry-web 
  become: true
❯ ansible-playbook -i ~/git/ansible/inventory ~/git/ansible/plays/sentry/sentry-web/push_config.yml

I won’t go too into detail about playbooks, because the documentation is sufficient. But I’ll summarize: our playbook targets our tag sentry-web and defines two tasks to run. The first is to upload our new systemd service configuration, followed by reloading systemd daemon and restarting the sentry-web service.

A couple other things as I wrap this up.

1. Take some time and view the output of gce.py — list. Any top-level json tag you see can be used as a host pattern to target machines. There’s some really helpful items there, such as Region and Zones. You can always parse this output with a json parser (such as jq) to obtain a base list of targetable items. Also, every hostname of an instance itself can be used as a target.

2. Ansible has a nice command-line flag for just listing hosts. I often use this when I need to actually SSH into a machine within an instance group.

❯ ansible -i ~/git/ansible/inventory tag_example --list-hosts
hosts (2):
instance-1
instance-2

3. The flag — check is very helpful. It’s a dry-run operation that’ll show you exactly what Ansible will do, but without making changes.

4. Experiment with some of these patterns:

Complex matching like below is possible:

# Return all instances with tag example but not tag one

❯ ansible -i ~/git/ansible/inventory ‘tag_example:!tag_one’ --list-hosts
hosts (1):
Instance-2
# Return all instances with tag_example but not hostname instance-1
❯ ansible -i ~/git/ansible/inventory ‘tag_example:!instance-1’ --list-hosts
hosts (1):
instance-2

I hope this gets the gears turning on how you can quickly create runbooks, list information, and orchestrate your cloud instances with Ansible. To me, this is the sweet spot between immutable infrastructure principles and traditional methods of caring for systems. The workflow really reminds me of the quick and easy usage of Fabric, but with better tooling around a dynamic cloud inventory. Let me know what you think, and enjoy!