No Workload Left Behind: How to orchestrate with HashiCorp Nomad, Part 3

Juan Luis Baptiste
Globant
Published in
8 min readApr 24, 2023
Photo by Joshua Chehov on Unsplash

This is the third and last part of my introductory article to HashiCorp Nomad, a generic workload orchestrator that can be used to orchestrate anything in addition to containers. In the first part, we reviewed Nomad’s features, architecture, and how it compares with other orchestrators like Kubernetes. In the second part, we reviewed some important concepts to understand how Nomad works, explained the different installation methods available, and performed a demo where Nomad was installed locally and ran a basic Nomad job to test it.

Now we will continue with the previous demo, running other jobs to test more advanced and unique features, including running other types of workloads different from containers.

Prerequisites

To perform the tasks from this article, you need to have done the demo from the previous article to have a Nomad cluster already running—a quick recap on how to run Nomad locally.

From a terminal, launch nomad in dev mode:

$ sudo nomad agent -dev -bind 0.0.0.0 -log-level INFO

After Nomad starts, we can continue with the demo of this article.

Testing basic orchestrator features

First, we will test a set of features expected in any modern orchestrator: how to scale a running workload, how to update it to a new version, and how to do a rollback in case something goes wrong with an update.

Job Escalation

We will scale the Nomad job executed in the previous article. If you don’t already have the example job running, let’s review how to do it first. To create the example job file:

$ nomad job init -short

To run the job:

$ nomad job run example.nomad

Now we are ready to continue!

We can scale up or down jobs as required. To scale a job, we need to add the count parameter to the group section of the job definition.

Now we will scale up this job, but this time we will use the web UI for that. Remember the address format:

http://IP_ADDRESS:4646/ui/

Replace IP_ADDRESS with the address where you are running the Nomad binary. If you are doing it on your laptop, then you can use localhost or 127.0.0.1; if you are running it remotely or in a VM, you will need to find out the corresponding IP address, like, for example:

http://192.168.0.22:4646/ui/

Go back to the Jobs page and click on the “Run Job” button:

Nomad job list

Copy the contents of the example.nomadfile and add the count=3 parameter to the task group stanza to scale up the job to use three replicas instead of one:

Nomad example job file

Now we click on the “Plan” button, and the UI will show us the planned changes to the job (for Terraform developers, it’s like when running terraform plan):

Nomad job plan

The plan output will show us the proposed changes to validate their correctness. Click on the “Run” button and wait a little bit for the new allocations to be created. After a moment, the three allocations will appear with a running status:

Nomad job allocations

Now, let’s change it back to 1 but now from the command line:

$ nomad job plan example.nomad
+/- Job: "example"
+/- Task Group: "cache" (2 destroy, 1 in-place update)
+/- Count: "3" => "1" (forces destroy)
Task: "redis"

Scheduler dry-run:
- All tasks successfully allocated.

Job Modify Index: 21
To submit the job with version verification run:

hclnomad job run -check-index 21 example.nomad

When we run a job from the command line, things work slightly differently. If you look carefully at the plan output, you will see that the proposed command to run the job is not just nomad job run; it adds another parameter, the -check-index parameter with a number. Nomad uses this index to guarantee that you are running the latest version of the job and that no other user has modified it in the cluster since we ran the plan. If another user has submitted a newer version of the job before we submit our changes, that index number would be different, making our plan invalid and aborting the job update. The -check-index parameter is not mandatory but is recommended when more than one user can submit jobs.

Let’s apply the plan with the suggested command:

$ nomad job run -check-index 21 example.nomad
==> 2022-12-15T11:59:50-05:00: Monitoring evaluation "89892feb"
2022-12-15T11:59:50-05:00: Evaluation triggered by job "example"
2022-12-15T11:59:50-05:00: Evaluation within deployment: "e9ab122d"
2022-12-15T11:59:50-05:00: Allocation "a5eb1f27" modified: node "956f8047", group "cache"
2022-12-15T11:59:50-05:00: Evaluation status changed: "pending" -> "complete"
==> 2022-12-15T11:59:50-05:00: Evaluation "89892feb" finished with status "complete"
==> 2022-12-15T11:59:50-05:00: Monitoring deployment "e9ab122d"
✓ Deployment "e9ab122d" successful

2022-12-15T12:00:01-05:00
ID = e9ab122d
Job ID = example
Job Version = 2
Status = successful
Description = Deployment completed successfully

Deployed
Task Group Desired Placed Healthy Unhealthy Progress Deadline
cache 1 1 1 0 2022-12-15T12:10:00-05:00

Now we can see that the number of allocations in running state is 1. The others are in stop status; when they are stopped, their status changes to complete:

$ nomad job status example

[...]

Allocations
ID Node ID Task Group Version Desired Status Created Modified
4f66a1a1 956f8047 cache 1 stop complete 8m1s ago 3m58s ago
fe22f72a 956f8047 cache 1 stop complete 8m1s ago 3m59s ago
a5eb1f27 956f8047 cache 2 run running 16m29s ago 3m49s ago

Job Update

Now we are going to test the rollback feature. Let’s return to the web UI and edit the example job again. We will change the Redis image version and deploy it again:

Nomad job plan after job update

We can see that the plan is changing the Redis version. After deploying this new version, it will take some time to download the new image, but after that, the logs of the allocation task will show it is using the new Redis version:

Nomad job allocation task logs

Job Roll Back

But what if we want to roll back to the previous version? On the versions page of the job, we can see the list of previous deployments. With our example job, there are multiple versions of all the tests done so far:

Nomad job versions list

Let’s roll back the last version to the previous one, so we go back to Redis 7:

Nomad job version changes

On the allocation log, we can see that the Redis version is back to 7.0.4:

Nomad job allocation task logs for selected version

Testing other kinds of workloads

Now let’s do a few more tests to showcase one of Nomad’s main features, executing other kinds of non-containerized workloads, like running a static binary and a Java application.

Static Binary

Let’s start with a very basic job using a static binary. We are going to run a ping command!

This is the job file:

Static binary example job

The job uses the raw_exec driver to run a static binary. With the commandparameter, we set the path to the ping command, and with the argsparameter, we set the parameters passed to that command. With the configured parameters, we let the ping run 1000 times, so it runs long enough to see its output in the allocation logs. Go back to the Jobs page, click on the “Run Job” button, paste the previous job, and then click the “Run” button. Wait until the job appears in running state before checking out the allocation task logs. We can see the ping command is running:

Nomad job allocation task logs

This is a fairly simple example, as the ping command is present on any OS. For a different binary, we would need to provision it somehow. Terraform or Packer could be used for this, but Nomad jobs also have a mechanism for this, using the artifact block to download something when the job runs and before the binary is executed. We will use this feature in the following example, running a Java application.

Java Application

This Java application for this example is pretty basic; it will only print “Hello Nomad!” and then sleep for one minute before stopping its execution. Our focus here is to showcase that Nomad can orchestrate any Java application. When running a Java application, you can execute a Jar file or a single compiled class file. We will run a Jar file containing the class file of our example app.

This is the job file:

Java example job

This job will execute a Java jar file using the Java task driver, and use the artifactblock to download the jar file into the client node, where the job is launched in an isolated location. This is mandatory with the Java driver because this driver will not allow us to set an absolute path on the client node for security reasons.

Inside the artifact block, the source parameter is used to set the URL from where the jar file will be downloaded. The destinationparameter sets the location relative to the isolated directory where the file should be downloaded to. (The isolated directory is configured in the agent’s configuration file.) You can use an optionsblock to set other optional settings, like the checksum value of the downloaded file. Then, on the task’s configblock, we set the jar_path parameter to the jar file path inside the isolated directory. With the jvm_optionsparameter, we can pass any parameter to the Java virtual machine needed to run the Java application.

Finally, let’s run this job by copying the previous job in the Run Job window on the Jobs page, as we have done with the previous examples. After that, check out the allocation logs of the new job:

Nomad job allocation task logs

Conclusions

In this last hands-on article, we tested other advanced features expected in any orchestrator, like workload escalation update and rollback. We also tested the main feature that differentiates Nomad from other orchestrators: being able to run other workloads different from containers, like running a static binary and a Java application. As demonstrated, running other types of workloads is equally easy as when running containers; only use the correct driver for the kind of workload, and configure the options to run it, and the rest is the same for all types of workloads (resource configuration, escalation, rollbacks, etc.).

There are some important things that would be nice to test, too, like the ACL configuration, a must for a production deployment, and a hybrid cluster comprising multiple cloud providers (or self-hosted) using Nomad’s native federation feature. Hopefully, this set of articles sparked enough interest in this tool for you to want to evaluate Nomad for current or future projects.

References

--

--

Juan Luis Baptiste
Globant
Writer for

DevOps & Automation engineer, open source developer and metal head.