Automated centralized instances management

Published in

Storm Reply

7 min readFeb 22, 2024

Introduction

Managing mutable machines is an important but complex task: upgrading through environments, continuous updates on different systems and handling rollbacks require a structured approach.
However, having up to date and fully patched machines is critical for both on-cloud and on-premise environments in order to have secure and stable systems.

This article describes an approach to configure Ansible Tower on AWS to centralize the management of instances inside the Landing Zone of a whole AWS organization.

Ansible Tower

Ansible Tower is an enterprise-level extension of the popular open-source automation tool Ansible. It offers a web-based GUI to easily manage Ansible inventories, launch playbooks or commands on a single or group of hosts, check the status and logs of jobs and so on.
It also provides a Role-Based Access Control (RBAC) system to grant fine-grained permission to perform specific tasks to different teams or users.
Finally, it also provides RESTful APIs to interact with its various functions in a programmatic fashion.
Ansible Tower is the licensed version of the open-source project AWX, offering enterprise support from Red Hat, and other additional and more advanced features.

The Challenge

One of the key challenges of creating a centralized hub to manage all the machines inside a whole AWS Organization is enforcing segregation and security. Each user and team should be able to view and access only the machines of specific AWS accounts and environments. This is extremely important for multiple reasons:

to avoid accidental changes on hosts in different environments
to prevent users from accessing data of different environment where they are not allowed to
to reduce the blast radius in case of breakage of an user account
in general, as a best practice to provide the minimum necessary permissions

This segregation structure needs to be created both on AWS and Ansible Tower, through IAM users and roles with the appropriate permissions for the former and by leveraging Organization, Teams, User and RBAC permission system for the latter, thus doubling the effort to maintain it.

Developers and maintainers should also be able to run Playbooks and commands against specific and meaningful groups and subgroups of machines.
Maintaining such structure, keeping secure and segregated environments on one side, while organizing hosts into the right groups and inventories, would require extensive work if done manually and would nullify the ease of use that systems such as Ansible Tower provide.

Moreover, in the fast-changing cloud world, where entire fleets of machines can be created, stopped or changed in a matter of minutes, it’s often required that Inventories are up to date with what is currently deployed. In fact, in many cases, it’s needed to run Playbooks on hosts as soon as they spin up, in order to, for example:

apply critical security patches or enforcing company security guidelines
perform hardening tasks on Bastion Hosts
configure softwares or applications in vanilla or templated images
adding the machine to an Active Directory Domain

All of these challenges are taken into account and then tackled in our solution, as described in the following paragraphs.

The Solution

In the following paragraphs will be described how our solution faces each of the aforementioned challenges, in particular:

how we architected the Inventory system using both Ansible Tower resources and a particularly organized tagging system, providing both structure, flexibility and ease of use;
how we kept the these Inventories up to date with the current deployment status in each of the managed AWS Accounts;
how we enforced segregation and security by design both on AWS and on Ansible Tower.

High-level diagram showing the resources used both on AWX and AWS.

In our solution, Ansible Tower is deployed on an EKS Cluster to leverage the scalability and manageability of a Kubernetes deployment. The deployment itself is done using the open sourced AWX Operator and Helm charts.

Tag-Based Inventory System

The Inventory Project template used to generate Inventories.

In both Ansible and Ansible Tower, there’s the concept of “Inventories”. They are logical collections of hosts (in our case EC2 instances) against whom both Ansible Playbooks and simple commands can be run. Inventories can be subdivided into groups and subgroups to reduce or specify the scope of the Playbook itself. The exact configuration can vary depending on needs, for example it’s possible to create Groups for EC2 instances having the same Region, Availability Zone, workload, type of machine, type of OS etc.

In our solution, Inventories are created automatically for each organization, sourced from an Ansible Tower project that’s generated from a single template and hosted inside a Git repository. This way, inventories are truly dynamic and can potentially collect all EC2 instances in each account based on the inventory configuration.

It’s also possible to filter out machines based on region, size and other specifications if needed.

Sourcing the inventory from a project also provides the possibility to organize all the hosts properly inside each inventory. By tagging EC2 instances with an established naming convention, it’s possible to create groups and subgroups inside the inventory itself and map EC2 instances to them accordingly. The hierarchy is completely configurable and can possibly grow indefinitely in depth. Using the example above, all EC2 instances hosting the company website could be tagged with “Workload:company_website” constituting a single Group inside the Inventory. Then, by adding the tag “WorkloadType”, it’s possible to divide it into the following subgroups:

Frontend
Backend
Bastion host

Leveraging this system, we can run different playbooks on different groups automatically; for example, we can install Nginx and Django on the “Frontend” machines, while installing Java and the application Jar files in the “Backend”.

Furthermore, for the Frontend group, we can create smaller subgroups collecting machines from a single AZ; this way, we can easily patch machines belonging to different AZs separately, avoiding possible issues and any downtime during the update.

This entire structure is created simply by tagging the EC2 instances!

Keeping Inventories Updated

How we trigger Inventory updates triggered by events on EC2 Instances

To keep inventories up to date, we implemented a simple architecture using AmazonEventBridge and an AWS Lambda function. Inside each account, there are EventBridge rules that trigger the Lambda function when an EC2 instance is created, started, stopped or terminated within each of the managed accounts. The Lambda function, then, makes an HTTP request to Ansible Tower APIs to refresh the inventory of the specific account.
Additionally, the refresh can easily trigger workflow or jobs inside Ansible Tower and, depending on the configuration, can be used for hardening, patching or configuration purposes on the machine that has just been created.

Enforcing Segregation

To maintain segregation in place, we extensively utilized Organizations: these are logical collections of resources inside Ansible Tower that work similarly to accounts in AWS. Users and even admins, within one Organization, are not able to view or access resources belonging to another Organization.

In our solution, every AWS account has its own “child” Organization in Ansible Tower and each Organization has a single inventory that collects the machines of the entire account. Each inventory uses a specific IAM role, granting it the necessary permissions to describe and list instances inside the AWS account.

Inventories are created with the following goals:

Users should not be able to alter inventories they do not manage
Users should not be able to run Playbooks against machines they do not manage

To enforce these conditions, we established “root” Organizations, one for production and one for non-production environments. Each root Org has its own IAM User that can assume the Inventories IAM roles pertaining to its scope.
The sole purpose of these root Orgs is to host inventory projects for all inventories in all the other child organizations. Consequently, by design, these projects, and thus the inventories derived from them, cannot be altered in any way without having super admin permissions on the whole Ansible Tower platform, obtaining condition n.1.
To impose condition n.2, no users inside each child Organization have permissions to create other inventories in it. This ensures that no one, not even admins within the organization, can access, view or modify instances that are outside of its scope.

Moreover, Ansible Tower can easily be configured to use many access protocols like LDAP and SAML to seamlessly integrate it with AWS SSO. Each user is mapped to its own Organization and Team, obtaining the appropriate scope and permissions.

Conclusion

In conclusion, we showed how it’s possible to configure Ansible Tower to automatically manage all machines within an entire AWS Organization. The platform gives developers, system administrators and cloud administrators, the speed and flexibility that modern Cloud infrastructures increasingly demand.

From here, the possibilities are endless, offering you the opportunity to harness the full power of automation.