Database/client relation in Ansible

best practice candidate

George Shuklin
OpsOps
6 min readJun 16, 2020

--

There is a very, very deep problem for every Ansible project: app/database relation. We need to configure database and users for application to access it, we need application to have that information to be written into configuration file, and, moreover, we need monitoring service supervising it. We want:

  1. Keep database username/password in a single place.
  2. Have this settings applied to database (create user and database) and application (create configuration file), and monitoring, may be with different credentials.
  3. Allowing automatic support for scenarios when there is no ‘app’ or there is no ‘monitoring’. If we want to install database only, that’s fine and we don’t want to fight with playbook for lack of hosts in [app] group.
  4. The same for database group, but only if application’s playbook wants this. May be our application can survive without database and we can just skip ‘create config’ stage if there is no database.
  5. We want to have a staging which is more or less similar to the production, that means, we want to be able to generate a new inventory with minimal amount of variables to each host and still keep interconnection between groups. Any additional variable or group in the inventory is a minus.

Ideal inventory should looks like this:

Replace ansible_host and database_password with desired, and you are green to go.

(I’m running a bit ahead of presenting the solution, may be your best solution would include database_pass at different place). For simplicity of examples I use only database_pass, but in reality it’s much more: database_username or database_usernames, database_name, database_permissions, etc.

Old approaches

Those things I’ve tried in the past and I’m not happy on results.

repeat yourself and be verbose

You write all information needed to each host

Playbooks are simple (you have variables, you write/use them), but your inventory is a mess, and there is a very high chance to get database_pass set to different values.

use metagroups

There are two problems here.

  1. You need a lot of internal playbook logic in the inventory. What is ‘db_settings’ and why it should have children?
  2. You need to find database_host for client. Which makes everything hard to write.

The last one is a real problem. Where you are going to get database_host?

And what if there is no default_ipv4 because there is no default gateway to the host?

Or, if database hadn’t gathered facts yet (because of --limit for ansible-playbook)? Or, if database group have two hosts, why had you chosen number zero?

And, finally, what if groups.database is empty? You need to write very specific code to handle this and it’s hard.

I could name few more options for those (rejected) solutions: you can use db_settings as a target for hosts in playbook to join them together, you can have special play to perform calculations and store this information in localhost's hostvars.

All those approaches is really, really inelegant and hard. The more code you write, the harder it’s become to maintain it.

Here is a piece of code I refactor right now:

How you rate this code? … Given that I need to add support for ‘no feedback_db in inventory’ case, my opinion about this code is pretty low (even if I wrote it three years ago).

Proposed solution

I really hope this will work and will become the best practice.

It looks almost like in examples before, but with little difference. foo is inventory_hostname for database, not the IP or fqdn.

The real magic happens in the playbook:

It’s slightly simplified to pack everything into one playbook without showing the actual roles.

The key idea here is that we store all client-related database settings (except for database IP) inside variables for the client. They may come from host_vars, group_vars or any other source. The key idea is that client knwos everything about credentials (and database name). Moreover, the same client knows which server to use as database server. The client is the source of the ground truth for it’s own database, and database server obeys.

This ‘obeys’ is implemented as delegation of user configuration (database creation, etc) from the client to the database server.

If we have environment with no client, there going to be no database to create (but we still may install it, as it may be needed for another client with different set of credentials).

Because we have the explicit link to the host, there is no ambiguity on which host to use in the group. You can have 10 clients with 7 database servers and each one will use the one it likes (database_host), no mess-ups due to reordering in the inventory, etc, etc.

Each credentials stored with minimal scope. If you need you can keep it in host_vars, and there going to be no clash with variables names with other hosts. (Just imagine, you have database and few clients, and each client need own password).

The pattern I propose is very similar to existing best practice of “vm delegates to hypervisor”. VM defines itself and task is run on hypervisor (to create this VM) through delegation.

Side note on ‘when’. The client may need database and playbook need to fail if there is no database. In this case when is excessive and it’s better to have a proper assert that: hostvars[database_host].ansible_default_ipv4 is defined.

…But it’s even better than it looks. Do you remember about monitoring?

Monitoring

Monitoring wants to see all databases and it may have own set of credentials for that.

inventory:

playbook:

You see? Full visibility without additional issues.

There is a lot of delegation and hostvars here, but all of them are following simple patterns you can easily reconstruct. There is a well-known provision path for each variable and the way to handle if there is no such variable.

Downsides

It’s really hard to criticize own fresh invention. Nevertheless, there are few things to note:

  • There is a need for additional variable in the inventory. It may be done in group_vars at a playbook level, but this will (for sure) obscure variable provenance and I strongly oppose this without a good cause.
  • There is a lot of hostvars traversing. Normally, I’m suspicious on any code with massive amount of ‘hostvars’. But there is a good rule I discuss earlier: all hostvars should happens or be controlled from playbook level. No sudden traversal inside tasks or templates inside roles is allowed.
  • Delegation complicates control flow. Like with use of delegation for VM management lifecycle, it’s a complication of known size and limitation. You can manage it and keep from becoming a spaghetti bowl.
  • In real code instead of using tasks for database (password) managment, it should be written as

Every import_role raises complexity. This is downside.

Normally I’d say ‘keep playbooks as simple as you can’, but I know that ‘dump’ approach does not work in multi-environement setups, therefore, this amount of complification is justified.

Foreword

I hadn’t implemented this pattern yet. It’s my next goal and I’ll update this article with fresh refactoring experience as soon as I done.

--

--

George Shuklin
OpsOps

I work at Servers.com, most of my stories are about Ansible, Ceph, Python, Openstack and Linux. My hobby is Rust.