Can you run Tetragon on HashiCorp Nomad? — Part 2

In Part 1 of my guide, I talked about the setup. Part 2 will be focusing on applying Tracing Policies to secure your Nomad workloads.

Glen Yu
5 min readJun 20, 2024

If you have not checked out Part 1 of my guide for setting up Nomad with Tetragon, please do so, otherwise I am just going to dive right into things!

Tetragon Policies

While Tetragon offers basic observability by default, capturing process starts and exits provides limited insight. You want actionable intel on what is going on inside your workloads and to be able to proactively address security concerns. For this, you will need to apply Tracing Policies.

These policies are defined in YAML format, typically used in Kubernetes manifests. Fortunately, you will not have to convert or rewrite these policies into a different format to be used outside of Kubernetes. The policies simply need to reside in the directory, /etc/tetragon/tetragon.tp.d/ to be picked up and applied when Tetragon starts.

Applying the policies

The following is the Tetragon jobspec from Part 1, but with a few extra configuration blocks:

  • artifact block is used to download our Tracing Policies from a remote location to the container
  • volumes option within the task/container config will mount the downloaded Tracing Policies to the appropriate location (see image below)

NOTE: for demo purposes, I am using a publicly readable Google Cloud Storage bucket to store my Tracing Policies, but there are other supported options such as a private AWS S3 bucket or private GitHub repo, so please use the option that best fits your organization’s security requirement

TIP: you can toggle which Tracing Policies get applied by controlling which files get mounted (you can leave the corresponding artifact block which downloads it)

Sample file system mounts within Nomad Tetragon agent

What goes into a Tracing Policy?

A deep dive into the anatomy of a Tracing Policy is beyond the scope of this article, but I will cover some essentials. To illustrate some key elements of a Tracing Policy, let’s examine a sample policy commonly deployed in my clusters:

  • security_bprm_creds_from_file is a kernel function (which we are probing for) that takes two in values (as denoted by index 0 and 1) and is one of the functions called before the execution of a file/binary and prepares the credentials for execution
  • matchArgs here is used to compare the file (index 1) being passed to a list of values we specified. In this policy the list is a list of various Linux package managers
  • matchActions here simply specifies what action to take when at match in the corresponding matchArgs is found. If none is provided, then no action will be take, but you will still have a log of the event occurring

NOTE: SIGKILL is not always the best action to take when trying to prevent actions. Override is a better action to take when you want to prevent writes to files within your workloads. However, this requires that your kernel to be compiled with the CONFIG_BPF_KPROBE_OVERRIDE configuration option

To summarize: security_bprm_creds_from_file is a kernel function that is called when a file is executed, and this particular Tracing Policy will send a SIGKILL to any processes trying to execute one of the package manager binaries listed. You can see a example of this (and another policy) in action below:

Tetragon Tracing Policies at work!

Caveats for containerized workloads

  1. Tracing Policies can be namespaced in scope (kind: TracingPolicyNamespaced), but unfortunately as Nomad’s implementation of namespaces is not recognized, all policies applied within Nomad automatically become cluster-wide.
  2. As of Tetragon v1.1.0, you can add selector statements to your Tracing Policies to filter deployments with specific labels to target pods. This will also not work with Nomad’s implementation of labels.
  3. Networking within a Nomad cluster is different from Kubernetes, so any policies that revolve around network traffic will need to account for Docker’s bridge network subnet CIDR (default: 172.17.0.0/16). Please see my block-internet-egress policy for an example of this (it is also the other active policy that is blocking connections in the image above — blocking curl commands to www.google.com, but still allowing internal traffic)

Will Tetragon work with other Nomad task drivers?

Outside of Docker, I have only tried the Java task driver and policies applies here the same way it does for containerized workloads. I would expect Tracing Policies to work the same regardless of the task driver used as Tetragon operates based off of kernel events/functions.

When working with different task drivers in Nomad, networking becomes a key consideration. Docker, for instance, utilizes its own bridge network for containers, requiring specific policy adjustments. Java jobs, on the other hand, operate at the node/host level, making it more challenging to restrict their internet egress since traffic originates from the Nomad client’s IP address. If you are not careful with you policy rules, you can end up preventing your Nomad client from downloading the required artifacts to run your jobs entirely.

Logging, monitoring, and alerting

I would be remiss if I did not include a small section about logging, monitoring, and alerting. Tetragon logs are written back to the Nomad client, where it it will persist. It is up to you — the user, to integrate this with your existing logging solution.

As I work largely in the Google Cloud (GCP) space, I added a logging block in export-stdout’s container/task definition to forward the logs to the Fluentd agent running on the Nomad client, which then goes to GCP and I am able to filter for specific log entries and create alerting rules based on said events if I wish.

Google Cloud Log Explorer filtering on Tetragon log entries

Continuing the journey

I hope you found this small two-part series on running Tetragon on Nomad insightful and hope you considering applying Tracing Policies to your Nomad cluster. If you would like to explore more examples, my Nomad on GCP GCE repository offers additional Tetragon Tracing Policies in the ‘examples/tetragon’ folder.

--

--

Glen Yu

Cloud Engineering @ PwC Canada. I'm a Google Cloud GDE, HashiCorp Ambassador and HashiCorp Core Contributor (Nomad). Also an ML/AI enthusiast!