Using Terraform to Run CloudQuery as a Kubernetes CronJob: A Guide

Garrett Sheppard
4 min readMay 9, 2023

CloudQuery is a powerful tool that transforms your cloud infrastructure into queryable SQL for easy monitoring, governance, cost and security policies. Today, we are going to explain how to run CloudQuery as a Kubernetes CronJob.

Before we start, it’s important to note that this guide assumes you have a pre-existing Kubernetes Cluster and a PostgreSQL server. If you haven’t set these up yet, please do so before proceeding.

Getting Started: Understanding CloudQuery

Before deploying anything, it’s crucial to understand what you’re working with. We highly recommend referring to the CloudQuery Documentation to get a firm grasp of how CloudQuery operates.

In this guide, we will be using the AWS source plugin. Source plugins in CloudQuery define the cloud resources you want to fetch and transform into SQL. CloudQuery supports many tables for each source plugin. In our case, we will be using an asterisk (*) to declare all tables. If you want to fetch specific tables, you can define those in the configuration.

Defining Terraform Variables

First, let’s define our variables in vars.tf:

variable "cq_aws_version" {
description = "Version of the CloudQuery AWS plugin"
type = string
}

variable "cq_awspricing_version" {
description = "Version of the CloudQuery AWS pricing plugin"
type = string
}

variable "cq_postgres_version" {
description = "Version of the CloudQuery PostgreSQL plugin"
type = string
}

variable "db_password" {
description = "Password for the PostgreSQL database"
type = string
sensitive = true
}

And provide the respective values in terraform.tfvars:

cq_aws_version         = "x.x.x"
cq_awspricing_version = "x.x.x"
cq_postgres_version = "x.x.x"
db_password = "your_db_password"

Replace x.x.x with the respective plugin versions and your_db_password with your actual PostgreSQL password.

Deploying CloudQuery with Terraform

Now, let’s deploy CloudQuery using the Terraform code below. Save this file as main.tf:

resource "kubernetes_namespace" "cq-namespace" {
metadata {
name = "cloudquery"
}
}

resource "kubernetes_config_map" "cq-config" {
metadata {
name = "cq-config"
}

data = {
"config.tpl.yaml" = templatefile("cq.config.tpl", {
cq_azure_version = var.cq_aws_version
cq_awspricing_version = var.cq_awspricing_version
cq_postgres_version = var.cq_postgres_version
db_password = var.db_password
})
}
}

resource "kubernetes_cron_job" "cq-cronjob" {
metadata {
name = "cq-cronjob"
}

spec {
schedule = "0 * * * *"

job_template {
metadata {
name = "cloudquery-job"
}

spec {
template {
metadata {
name = "cloudquery-pod"
}

spec {
container {
name = "cloudquery"
image = "cloudquery/cloudquery"
command = ["sync", "/config.tpl.yml"]

volume_mount {
name = "cq-config"
mount_path = "/config.tpl.yml"
}
}

volume {
name = "cq-config"

config_map {
name = "cq-config"
}
}

restart_policy = "Never"
}
}

backoff_limit = 0
completions = 1
parallelism = 1
}
}
}

depends_on = [
kubernetes_config_map.cq-config
]
}

This script will create a Kubernetes namespace, a ConfigMap that holds the CloudQuery configuration, and a CronJob that will run CloudQuery on the schedule specified.

A Note on Security Best Practices

When dealing with sensitive information such as database passwords, it’s important to follow security best practices. Storing sensitive data as plain text in your scripts or version control system can expose your infrastructure to potential threats. While we have included the password directly in the vars.tf file for simplicity in this guide, it's generally not recommended for real-world applications.

In a production environment, you might want to consider using a secure method to retrieve your password, such as using Terraform’s data calls to pull the password from a secure vault or secret management service.

Viewing Your Data

With this setup, CloudQuery will run every hour (as specified by the CronJob schedule) and sync your cloud resources data into your PostgreSQL database. This allows you to have your cloud infrastructure data in a queryable format, enabling you to make informed decisions about your cloud governance, security, and cost.

You can view this data using a visualization tool like Grafana or another tool of your preference. This will allow you to create custom dashboards and alerts based on the data from your cloud resources.

Wrapping Up

Running CloudQuery as a Kubernetes CronJob makes it easy to automate the process of syncing your data, ensuring you always have up-to-date information about your cloud infrastructure.

Remember to adjust the schedule and the configuration to suit your needs, and ensure that your Kubernetes cluster has access to the necessary cloud resources. Always follow security best practices when dealing with sensitive data, and don’t forget to refer back to the official CloudQuery documentation whenever necessary.

Happy querying!

--

--