Vulnerability Management — Golden AMI Pipeline

Bitan Mallick
syngenta-digitalblog
6 min readJul 3, 2024

In today’s cloud-native environment, ensuring reliable and secure infrastructure is crucial. One critical component of this infrastructure, for organizations leveraging Amazon Web Service (AWS), is the Amazon Machine Image (AMI), which serves as the foundation for all your Elastic Compute Cloud (EC2) instances.

An Amazon Machine Image (AMI) provides the information required to launch an EC2 instance. Building and managing AMIs manually can be time-consuming and error- prone. Ensuring that we have a reliable and consistent process for creating and managing AMIs is important for the stability and security of our infrastructure. This is also necessary for having security compliance (like SOC2) in place.

To streamline this process, in Syngenta Digital AgTech we leverage AWS Image Builder, AWS Lambda, Event Bridge, Parameter Store, and Terraform, to create, test, distribute, and manage the lifecycle of our golden AMIs. This ensures consistent, secure, and up-to-date AMIs across regions and accounts.

Overview of the Workflow

Our workflow is designed to automate the creation, testing, and distribution of Golden AMIs using AWS services managed with Terraform.

Here is how it works step-by-step:

  1. Event-Driven Triggering with Event Bridge

We utilize AWS Event Bridge to schedule triggers at regular intervals, ensuring our AMIs are regularly updated based on predefined schedules. This trigger initiates the workflow by invoking a Lambda function.

2. Parameter Store Update

The Lambda function triggered by Event Bridge updates the AWS Systems Manager Parameter Store with the latest base AMI ID for a given AMI family. This Parameter Store acts as a central repository for configuration data, making it easy to manage and update AMI configurations programmatically.

3. Image Builder Pipeline Automation

Another Lambda function monitors changes in the Parameter Store. When a change related to an Image Builder pipeline is detected (e.g., a new base AMI ID), this Lambda initiates the AWS Image Builder pipeline for that specific configuration. The pipeline defines all the steps necessary to build the AMI, including installation of software packages, configurations, and security updates.

4. Custom Testing Phase with Qualys

Once the AMI is built, it undergoes a testing phase using Qualys to scan for vulnerabilities. This custom test phase ensures that the AMI meets our security standards before it can be deployed/distributed.

5. Ad-hoc Patching with Image Builder

In cases where vulnerabilities are identified during the testing phase, we employ an ad hoc Image Builder component. This component patches the AMI swiftly, ensuring that only secure AMIs are deployed across our infrastructure.

6. Automated Distribution and Tagging

Upon successful testing, the AMI is automatically distributed to multiple AWS regions/accounts using AWS Image Builder distribution settings. Terraform manages the infrastructure as code, ensuring consistent tagging and naming.

7. AMI Lifecycle Policies

We have tag-based policies configured to automatically deprecate and eventually disable obsolete images based on their age. This helps in managing the AMI lifecycle efficiently.

Easy Pipeline Creation

Adding/Creating a new Golden Image Pipeline is all about running a Backstage template! The Backstage template provides an effortless way to create new pipelines and abstracts all complexity making the process seamless and fast.

Repository Structure

The image below shows how the golden AMI pipelines are structured in GitHub, which acts as the source of creating/managing the pipelines.

  • The scripts folder contains scripts used by common components (e.g., install Qualys agent, patch, update Linux) and lambda functions.
  • The pipelines folder consists of the pipelines (one pipeline per AMI family), which creates and manages all necessary infrastructure for the respective image pipeline.

Advantages of Our Approach

  • Automation and Efficiency: By automating the AMI creation and distribution process, we reduce manual effort and minimize human errors. This leads to faster delivery times and more reliable infrastructure provisioning.
  • Consistency: Terraform’s infrastructure as code (IaC) paradigm ensures that our AMI configurations are consistent and manageable.
  • Security and Compliance: Integration with Qualys for vulnerability scanning ensures that our AMIs are rigorously tested for security vulnerabilities before distribution. This proactive approach enhances our overall security posture.
  • Scalability: The use of AWS services like Image Builder and Lambda allows us to scale our AMI creation and distribution process effortlessly as our infrastructure grows.
  • Lifecycle Management: Automatically deprecating and disabling obsolete images helps in managing the lifecycle of AMIs effectively, saving storage costs and reducing clutter.

Manageability with Terraform

Terraform plays a crucial role in managing our AWS infrastructure for AMI creation and distribution:

  • Declarative Configuration: With Terraform, we define our infrastructure requirements in code, making it easy to understand, version control, and reproduce environments.
  • Resource Management: Terraform manages resources such as AWS Lambda functions, Event Bridge rules, Image Builder pipelines, and IAM (Identity and Access Management) permissions consistently across our infrastructure.
  • Integration and Extensibility: Backstage templates enable easy addition of new Image Builder pipelines for different OS families. This modular approach ensures that our infrastructure remains flexible and adaptable to future requirements.

Caveats

Below are some caveats of the workflow and things to consider:

Dependency on External Tools

  • Issue: The testing phase relies on an external tool (Qualys) for vulnerability scanning, which could become a bottleneck if the tool is slow or experiences downtime.
  • Impact: Potential delays in the AMI build process.
  • Mitigation: Consider implementing redundancy by integrating additional scanning tools or retry mechanisms.

Manual Intervention for Vulnerability Resolution

  • Issue: If the pipeline fails due to vulnerabilities detected by Qualys, manual intervention is needed to investigate and resolve these issues.
  • Impact: Increases the time and effort required to maintain the pipeline.
  • Mitigation: Establish a dedicated team to handle these issues quickly and efficiently and automate as much of the remediation process as possible.

Frequency of AMI Publishing

  • Issue: Frequent publishing of golden AMIs requires teams to move to compliant AMIs regularly.
  • Impact: Can lead to time wastage if not managed properly.
  • Mitigation: Balance the frequency of AMI publishing to ensure teams have enough time to migrate to new images without causing disruptions. Implement a clear schedule and communicate it to all teams.

Complexity in Management

  • Issue: Managing many AMIs across multiple regions and accounts can become complex.
  • Impact: Can lead to mismanagement or overlooked security updates.
  • Mitigation: Use tagging strategies, share AMI’s and have proper lifecycle policies in place.

Storage Costs

  • Issue: Storing multiple versions of AMIs can lead to increased storage costs.
  • Impact: Higher operational costs.
  • Mitigation: Implement lifecycle policies to automatically deprecate and delete old, unused AMIs. Monitor storage costs and set up alerts for thresholds.

Security Concerns

  • Issue: Outdated AMIs can contain vulnerabilities, and improper access controls can lead to security risks.
  • Impact: Increased security risks.
  • Mitigation: Regularly scan AMIs for vulnerabilities. Implement strict IAM policies to control AMI usage.

Conclusion

This automated workflow leveraging AWS Image Builder, Lambda, Event Bridge, Qualys, and Terraform exemplifies modern DevOps practices. By integrating these technologies, we achieve greater operational efficiency, security, and scalability while maintaining consistency across our AWS infrastructure. This approach not only streamlines our AMI management process but also aids in ensuring that our infrastructure is compliant.

By adopting this workflow, we effectively manage AMI lifecycle, reduce operational overhead, and focus on delivering value to our customers.

--

--