Job Scheduling — Quick and Slow

Bruno Oliveira

Published in

hurb.labs

7 min readMay 13, 2021

A simple and effective way to schedule and monitor job executions using HTTP APIs.

Introduction

Often times we come to a point where the application needs to keep up with higher throughput demands. Be it because of a growing business or even new integrations consuming data.

Scaling ‘everything’ up is usually a common answer. The problem with this is how well your application is prepared to scale. Because ‘everything’ means your state-updating machines, your database connections, your idle time and even your error rates (!).

With that in mind, the idea is building something that can scale work and not project. And for this, we need a structure that will enable us to perform quick operations and slow (heavy) ones in a tuned way so that you can get the most of your not-so-cheap server.

Scale unprepared apps and you might unleash the beast.

tl;dr “Talk is cheap. Show me the code” — Linus Torvalds

https://github.com/brunofurmon/job-scheduler

The code example for this article is available on the link above. Feel free to love/hate and expand it :) Gists may get outdated over time as the repo updates.

Motivation

Let’s say you’re imbued with the task to perform heavy data processing, periodic cache loading or sorts on an existing system. Or you want to simply fiddle with a different paradigm, language or architecture.

As a developer, we want to make sure our application has some of the interesting perks below:

Scheduling jobs with ease;
Monitor its status;
Read historical executions;
Write a code that is mostly language agnostic;
Not letting your HTTP server hang itself after receiving a job request;
Scale your workers individually, keeping your server as low-cost as possible;
Write business that is shareable within multiple targets;

In this approach, we’ll build a web application that receives orders to perform a heavy process that runs within a “job” context, and a worker application that will execute jobs on demand. Both sharing domain and infrastructure parts.

Requirements

The code reference for this article will be based on some concepts listed below:

Inversion of Control. If you are not familiarized with this awesome project from Jeff Hansen, I strongly recommend reading https://github.com/jeffijoe/awilix. He even wrote a series of articles showing how to take the most of it : https://medium.com/@Jeffijoe/dependency-injection-in-node-js-2016-edition-f2a88efdd427
A messaging platform, which can be anything like NSQ, RabbitMq with a pub/sub interface;
A document database to register and monitor job execution.

For the source code, we’ll be using:

NodeJS — https://nodejs.org/
Docker — https://www.docker.com/
Mongo — https://www.mongodb.com/
NSQ — https://nsq.io/
Awilix — https://github.com/jeffijoe/awilix
Koa 2 — https://koajs.com/

Project Structure

Our folder structure look like this

, where we have the following responsibilities:

Application — The 2 application targets, that run separately

Server — Our KoaJs HTTP server handling schedule, read and cancellation requests
Worker — Our console application that receives work commands through a message bus.

Domain — The things we want to solve, core of your application, things you should really worry about. The core methods that receive ubiquitous names and have real life meaning.

Business — In our case, the heavy module that knows how to performHeavyTaskInMs.
JobHandlers — The way our application understand job scheduling events and what should be executed whenever something comes in.
Jobs — A simple structure that uniquely name their prefix and also describe their initial_state structure;

Infrastructure — Connection with our message bus, database and code that takes away from a generic implementation. For example, if your collection database of preference is MongoDb, this will be the only place where mongo should be dealt with. Same with NSQ etc.

Database — The low level representation of job’s data, a repository where to manipulate them and everything we need to connect to our collection.
JobScheduler — Here we have all the methods we need to manipulate jobs on our systems.
Logger — A colored logging that uses built-in console.
Messaging — All caveats we need to write and subscribe messages on a JOB_TOPIC.

*Note on HTTP not being part of infrastructure — On some applications, it is a good practice to declare HTTP clients and handlers on the infrastructure side, in order to isolate adapters logic and strategies implementations. For simplicity, I’ve chosen simply not to.

Outside the folder structure, we have a container.js that bootstraps the system in a way that we only need to declarate how to build our dependencies once for both server and worker.

Both server and worker are brought to life simply by resolving them from the container.

What is Job? (baby don’t hurt me)

This is the structure where we’ll address the tasks’ execution information:

job_id: A unique key (uuidv4, for instance);

job_key: Job’s complete key, which is prefixed with jobPrefix;

job_type: jobPrefix variable in a Job configuration. See (src/domain/jobs/heavyJobInfo.js);

created_at: Creation date;

updated_at: A list of dates where an update occurred. This could be big if a task runs in a large loop;

status: The task’s status, which can be one of the following:

data: Important information regarding execution of the job. It could be numbers of rows affected, load size or anything of interest. This item could be often updated if a task runs in a loop;

completed_at: Completion date;

The server and the client’s odyssey

Our HTTP server provides the following routes:

So, whenever a client performs a POST to /jobs/heavy, our controller will call our jobScheduler to queue a new job using our jobScheduler interface, returning to our client the public address where its status will become available for reading.

And when performing GET jobs/heavy/7769be0c-a23a-4a96–996a-2b01f0fd7295 before any worker receives the job request, the client will find the job on a QUEUED status:

From here on our client will have the interface to read a job status whenever it finds convenient*.

*note — I know long polling strategies are a bad practice(even though necessary in some cases). The intention on this work is to fire tasks that run in a standalone and reliable way. In calling convenient, I mean that you can read the job status at any time, as a way to check progress or monitor execution times. If you need another system to receive an update/completion/failure report, then I’d recommend that your domain logic receives an interface to a notification bus, where it will announce them.

The worker and the client’s patience

Ok, now it is time for our worker to work.

It’s responsibility is to have a connected readBus where it’ll receive job requests and dispatch them by their JobType to the according domain business.

The domain job dispatcher is consisted of a switch statement that selects what to do with the incoming job request:

Once our domain method is called, we pass our jobScheduler interface as a tool for it to eventually update the job status, which will be available at /jobs/heavy/<job_id>.

After the job is completed, a job will be shown as the following:

Top View

Ok, a lot of stuff going on. Now through the looking glass, from the outside, this is what we got:

Simpler scenario, where 1 worker is enough to deal with load:

Heavy scenario, when there is little average idle time among workers: 1 Server and 4 workers (3 are busy):

Conclusion

We came to a topology where we can schedule in a way that can scale work instead of the whole project. Also in a way that a client can check and monitor its status without the impact of an overloaded system.

On the infrastructure side, although we used a minimal configuration, it is necessary (and also good practice), that we configure requeue message on failures, reconnection of a gone database among other thing that will ensure that your environment is A-OK.

The main objective is to reach knowledge about an architecture that can be applied in different ecosystems, languages and flavors.

Next steps

As an exercise on a non-fluent language, I have a personal challenge to try and perform a saga pattern in a sharing domain job scheduler :]