Containing Windows Executables with Damon
In my previous post, “A Short Introduction to Windows Containers”, I went into some detail about the trade-offs and constraints that you’ll have to deal with when working with Windows Containers. But what if you don’t want to deal with any of that? If you want true isolation, there isn’t much choice. However, if you are willing to give up on isolation, you can get resource constraints without all the extra baggage of windows containers.
We recently open sourced Damon: a supervisor process to constrain windows executables. Jet runs thousands of .NET microservices on Windows which are supervised by Damon without using containers. However, before we go into detail; it’s important to understand some of the fundamental building blocks of constraining a windows process, and what you would need to do to make use of them, in order to understand why you would choose Damon over doing this yourself.
Job Objects on Windows
Windows has had this concept of a Job Object since XP/Server 2003. They are a fundamental component of how containers are constructed.
A job object allows groups of processes to be managed as a unit … enforcing limits such as working set size and process priority or terminating all processes associated with a job.
Job Objects can enforce memory and CPU constraints (kind of like Linux cgroups). Also, they can prevent processes from escaping the job — which means they can be automatically terminated when the job is closed. The compute and memory resources allocated to a job are shared between its members; including child processes spawned by a process associated with the Job Object.
However, Job Objects can’t provide any form of resource isolation. Processes within a Job Object can view and interact with resources outside of it. Isolation is provided by separate mechanisms in Windows (Namespaces / Silos). Theoretically, you can get some isolation by interacting with the Silos API, but the documentation is almost non-existent; We’ll stick with the well-documented APIs.
Using Job Objects
The only way to interact with a Job Object is to use the Win32 API. DotNET has bindings to the Windows API — so you don’t exactly have to write C code, but the .NET code you’ll end up writing won’t look much different. The standard formula is as follows:
- Create a JobObject with CreateJobObject. This returns a handle to a new, empty Job Object.
- Call SetInformationJobObject on the Job Object Handle. This API will be called multiple times with different parameters to set all the various resource constraints that should be applied.
- Call CreateProcess with the CREATE_SUSPENDED flag. This will create a new process with the main thread in a suspended state. This part is important, because we don’t want to run the process without first attaching it to the JobObject. This API will return a Process Handle and a Thread Handle.
- Call AssignProcessToJobObject with your Process Handle. This will bind the process to the Job Object and therefore apply all of the various resource constraints.
- Finally, call ResumeThread on your Thread Handle. This starts your main thread which is now contained within a Job Object.
Simple! Would you like to see some C code? No! Of course not. Nobody wants to interact with the Win32 API if they don’t have to. How about if we made this easier?
Containing a process with Damon
Damon is an open-source, stand-alone binary written in Go that we at Jet created to help us contain our Windows applications running on HashiCorp Nomad. Damon does the opposite of Nomad: While Nomad wants to scale up and out to keep your service running; Damon is responsible for scaling it back and preventing it from monopolizing your system resources. Damon is meant to be used in conjunction with the
raw_exec Nomad Task-Driver on Windows; but could also be used without it.
We’ve been using a variant of this internally to constrain our microservices since 2016. We run thousands of .NET microservices on Windows, all constrained by Damon. We constantly collect Prometheus metrics from Damon and use this information to provide feedback to development teams about when services are consuming more than they are allocating. We can also use the metrics to help report on excess costs incurred by services that are allocating too many resources and only use a small fraction of them.
Under the hood, it uses JobObjects to constrain executables to the resources specified in environment variables. Damon can also run processes with restricted permissions, similar to dropping capabilities on Docker. Additionally, it exposes Prometheus metrics in order for operators to gain visibility into how resources are being used and how often the process is violating its quota.
Constraining your process with Damon is as simple as prefixing it with
damon.exe and setting some environment variables to specify how much resources the executable should be given.
However, Damon was designed for running executables on Nomad.
We’ve tried to pick good default options to keep the configuration minimal. This is why we can drop Damon into the command-line and that’s all that is required. It will get the CPU and Memory limits directly from the Nomad job resource settings via environment variables. It will log to and rotate its own log files so as not to clobber any log output your services may have. All of these options are customizable through environment variables which are documented in the README on github.
If you want to get Damon’s metrics into Prometheus from Nomad, you have to do a little bit of extra work in your Nomad job spec:
- You have to ask for a port labeled
- You’ll have to create a service entry to advertise that port to your Prometheus scraper via Consul Service Discovery.
Damon is still fairly new, and we are looking for ways to make it more useful by providing more constraint and isolation features. If you are interested in this, Pull Requests are welcome!
If you like the challenges of building complex & reliable systems and are interested in solving complex problems, check out our job openings.