Can you run Tetragon on HashiCorp Nomad? — Part 1
Tetragon
While Tetragon has gained recognition within the Kubernetes community, its core functionality transcends that specific environment. It offers tools like Custom Resource Definitions (CRDs) and APIs to simplify deploying and managing Tracing Policies within Kubernetes, but its true nature is better described as “Kubernetes-aware.” This means Tetragon isn’t limited to Kubernetes. You can leverage it in other settings like Docker, run it independently as a systemd service, or even use it with Nomad, another popular workload orchestrator.
Nomad
Nomad, a versatile workload orchestrator, has always impressed me with its ability to manage diverse workloads. Beyond containerized applications, Nomad can handle various task types as long as you have the corresponding binary or executable (referred to as Task Drivers in Nomad). In fact, I’ve written several articles exploring Nomad’s flexibility and user-friendliness. If you’re interested in learning more, feel free to check them out!
Setup requirements
There are a few notable settings in the Nomad client’s configuration that is required for this to work. I have included them in my Packer & GCP with GitHub Actions repo:
- Enable privileged Docker jobs
- Add host volume mount for BTF (read-only)
- Add host volume mount for Tetragon JSON log path (read-write)
Additional (optional/recommended) setup requirements
The Override action in Tracing Policies (more on Tracing Policies in Part 2) requires a kernel compiled with the CONFIG_BPF_KPROBE_OVERRIDE
option. This option is not currently available on Google Compute Engine VMs, and its presence may vary across cloud providers. Unfortunately, in public cloud environments, you typically lack control over kernel configuration. However, if you manage your own datacenter and have kernel customization freedom, enabling CONFIG_BPF_KPROBE_OVERRIDE
during compilation is recommended.
Tetragon deployment
The following Nomad job specification (jobspec) defines a Tetragon deployment similar to its Kubernetes counterpart. It includes an agent container responsible for the core Tetragon functionality. A separate container, named export-stdout, continuously reads logs from the agent container's standard output (/var/log/tetragon/tetragon.log) and outputs them to stdout.
There’s a lot to unpack, so let me break things down for you:
type = system
is the equivalent of a daemonset in Kubernetes. Nomad will ensure there is one instance running per Nomad clientgroup
is the equivalent of a pod in Kubernetes whiletask
is the equivalent of a container. Here we are running two containers: the aforementioned agent and the export-stdoutvolume_mount
blocks reference thevolume
blocks, and the source name is not arbitrary — it is defined in the Nomad client config as these are are host volume mounts (see Setup requirements above). One of the volume mounts is for Tetragon’s logs so that it can persist
Below is an image of the Nomad UI showing the stdout of the export-stdout container:
Dude, where’s my networking?
You may notice I did not specify a network block in the Tetragon jobspec. This is because unlike its Kubernetes counterpart, where pods within a cluster share a pod-specific CIDR range, each Tetragon agent is isolated to the Nomad client it is deployed in (i.e. standalone mode). Hence there are no ports that needs exposing; it just works directly with the kernel probes (kprobes), which you will learn about in more detail in Part 2.
Additional Tetragon on Nomad caveats
While Tetragon functions in standalone mode on VMs, some advanced features leverage Kubernetes concepts like pods and namespaces for event filtering. However, Nomad’s equivalent concepts are not recognized and hence you will not be able to filter your events by pod/group name or by namespace.
Nomad, as a versatile workload orchestrator, manages various tasks through task drivers (e.g., Java, QEMU). It also includes built-in health and version checks for these drivers, which will generate event entries visible to Tetragon. While these entries might not be directly relevant to Tetragon, they are a normal part of Nomad’s operation, but unfortunately will be noise in your logs.
Below is an example of the events output using the Tetragon CLI, tetra
from within the Tetragon agent container:
What’s next?
I’m currently working on ways to expand Tetragon’s functionality within the Nomad environment. The goal is to achieve a user experience comparable to Kubernetes’ version.
Part 2 of this guide (arriving next week) will delve into the power of Tracing Policies. You will learn how to leverage them to enforce runtime security within your Nomad workloads.
EDIT 2024–06–20: Part 2 of my guide can be found here!