Microprocesses: a new architectural design pattern for background jobs on a microservice architecture

Published in

CreditorWatch

9 min readMay 9, 2021

Background processes are the biggest forgotten element when implementing microservices and needed in the vast majority of applications.

Most of the bibliography about micro-services gravitate around how to split the monolith, domain-driven design, orchestration vs synchronization, how to split the database, etc. but it usually doesn’t mention background processes and how to implement them in a micro-services architecture.

On that note, I’d recommend the books Building Microservices and Monolith to Microservices both from Sam Newman that cover pretty much anything mentioned above except for, of course, background jobs.

As a bit of background, in CreditorWatch, we implement several micro-services at the moment, and each microservice is responsible for a piece of data (domain).

Our background processes are not limited only to user interaction triggered actions (in that case, we would raise an event that would go through our event-driven architecture and so on, pretty standard), but some (most) of our background tasks are scheduled tasks and take care of data ingestion, data updates, emails, and others.

In terms of scale, in CreditorWatch we have around 40M Microprocesses per month.

Many of these processes are VERY long and heavy (some of them can take up to a week to complete), such as the process to calculate all the credit scores for all the companies in Australia in our OLTP system. In CreditorWatch, we have a quite efficient CI/CD pipeline that triggers multiple deploys per day and we use Docker containers for our micro-services.

That pretty much sums up the infrastructure that we have and the problem we needed to solve.

Ideally, we wouldn’t want to have a box that can’t be updated because it’s running a long-term process so this solution will have that in mind and will try to solve this too.

In this small article, we’ll try to explain the Micro process pattern (we coined the term from Micro-services and background processes) and how we successfully implemented it using AWS Services. We call it a design pattern because it’s a proven solution (we’ve successfully implemented it many times) for a common problem (implementing long background processes in a micro-services architecture)

Microprocessing consists mainly of dividing very big tasks (1 process) into much smaller ones (Micro process) and process those using our micro-services logic and architecture.

This concept is not new and is widely used in other disciplines (MapReduce in BigData clusters or divide and conquer algorithms), but this is an approach of the same technique applied to a micro-service architecture that gives us many benefits and very few drawbacks.

To do so, we’d have 1 process (it could be scheduled or manually triggered) whose only job would be to gather and trigger all the jobs that need to be processed. Note that this process won’t actually process any of the end results that need to be achieved (in our example before, calculate all the credit scores for all the Australian companies). It will just simply queue a job to update the credit score for every single company

This is notably faster than calculating all the credit scores since the process of breaking into small takes a few minutes while calculating all the credit scores takes several days. In our case, calculating a credit score takes, on average, half a second (how we achieve such speed for the amount of data that we process is worth of another article entirely), so having into account that we have nearly 19M companies in our database, the whole calculation would take ~100 days in a single process to complete. Given that we do this every single month, one single process is not viable.

Many times, the process of dividing tasks is so lightweight that we are able to implement it in a lambda function (note that lambda functions have a processing time limit of 15 mins), which allows us to not worry about crontab configurations in a server or a virtual machine.

Reached this point, we have a lot (millions, maybe) of small tasks in our queue waiting to be processed, so no “real job” has been done yet.

There are, of course, many ways to parallelize the executions of jobs once you have them all in a queue.

Traditionally, we might have a box with supervisord (or similar) and multiple processes pulling messages from the queue, but that would mean that we’d have a box constantly running the code to pull the messages AND the code to process them, which would belong to the micro-service.

Even though this approach (and any other that uses the same micro-service code and the code to pull messages from the queue in the same environment) is valid and would work, we found that having 2 different environments (virtual or physical server with background process AND our docker containers for live traffic) created a lot of overhead.

In some configurations, such as virtual box, if we wanted to deploy we would need to stop supervisord and wait for the processes to finish before spinning a new one with the new code and destroying the previous one, which would make the deployments much more complex since we would need to track all the background processes.

Also, we would have to come up with 2 different ways to monitor our application (background processes and live endpoints), make sure that our logger is able to trace properly all the logs in 2 different environments, make sure that the dependencies are correct in both places and so on.

Note that I didn’t even mention having 2 different codebases to calculate credit scores, one for the background process and one for the micro-service, so I hope your mind wasn’t wondering those forbidden territories of code duplication.

Ideally, we’d want to:

Do not duplicate code
Do not have multiple system configurations (that we’d need to test)
Be able to monitor the wellbeing and progress of our background processes
Scale up and down (process faster outside working hours, for example)
Be able to deploy quickly and use the latest version of the code as soon as possible
Have simple deployments and low maintenance

The solution that we came up with to process Microprocesses is native to the micro-services architecture. We leveraged SQS + Lambda to create a push queue and call a micro-service endpoint to execute the task of a Micro process.

We talk about the SQS + lambda approach more in detail here https://medium.com/creditorwatch/how-to-successfully-create-a-push-queue-using-sqs-lambda-57f299056fe7

With just these 3 elements:

A process that divides the big process into very small Microprocesses
Push queue (implemented in our case with SQS + Lambda function)
Endpoint embedded in the micro-service

We achieve most of what we wanted.

We achieved:

Do not duplicate code (all the code resides in the Microservice codebase)
Do not have multiple system configurations that we’d need to test (we only have the microservices infrastructure)
Be able to monitor the wellbeing and progress of our background processes (we can always see how many pending messages are in the queue)
Scale up and down (when implemting lambda functions, we can scale up and down on demand, more about in this article: https://medium.com/creditorwatch/aws-lambda-facts-you-wish-to-know-before-processing-2-billion-lambda-executions-2021-78fe77183c80)
Be able to deploy quickly and use the latest version of the code as soon as possible (by doing our current deployments)
Have simple deployments and low maintenance (we would deploy as usual and no extra overhead is needed)

This solution has its drawbacks, though:

Microprocesses limited to 15 minutes (if using Lambda, as we are)
Live traffic and traffic from background jobs going to the same infrastructure can obfuscate monitoring and impact live traffic (solution below)
Maybe the process can not be divided, therefore this approach won’t help much

It is possible that the process of Microprocesses is slower than live traffic, and we want to make sure that we can correctly monitor the wellbeing of both types of processes.

To avoid that obfuscation of monitoring and to avoid the impact that Microprocesses can have in live traffic (it can consume resources needed for satisfy live traffic such as memory, max processes per container, …), we built a clone infrastructure (same docker container image) under another subdomain.

In our case, for the credit score example we’d have:

scores-live.domain.com

score-queue.domain.com.au

Background jobs will point at creditscore-queue-service.creditorwatch.com.au, while live traffic will point to creditscore-service.creditorwatch.com.au.

With this small tweak, we can scale up & down just live traffic (or background processes) capacity on demand without affecting the other and also monitor more effectively, since we can easily filter by the host.

That covers the map, but where’s the reduce?

The previous process covers how we process all the small pieces of the big process, but how do we glue them together?

Let’s say, to stick to this example, that the goal of the background process is to get a report with all the credit scores for all the companies that we have and send them via email to the data science team so they can run stats on them.

This is a very well-known problem in software engineering when working with concurrent processes, and it has many solutions (the prisoners problem is already a classic in concurrency and a great exercise if you want to code it using the monitor pattern), we’ll just propose a simple one using already discussed solutions.

When launching all the processes, we will create a record in the database. This process will have a process id. This will be the parent id process. For the rest, we’ll create a record as well with their own process id AND a reference to the parent one. This record will have the result for the process in there (in this case, the credit score). Note that you might need to store big chunks of information (we actually have a process that stores a text file that then needs to be merged into the other ones to complete the whole task). In that case, you can put in a filer (being a mounted volume, S3 folder, ….) and store the reference to it.

Now, when the child process runs and finishes, it needs to notify the parent, that will check that all the other processes are finished or not, and if they are, it’ll run the task of putting all the credit scores in a file, and send the email.

Of course, there are different ways to notify the parent process. In our example above, it seems reasonable to use the existing architecture, which is to queue a job and then use a push queue to execute code in the micro-service to assess if everything is finished, and if so, gather the results and send the email.

Just a reminder: when working with concurrent processes, make sure you lock the tables you are using to make sure the mutual exclusion of processes. Otherwise, you are in for a few not-so-great surprises.

Conclusions

Long-running background processes can be tricky to implement in a micro-service architecture and present several challenges so, to overcome them, we created a new design pattern that we called Microprocesses.

The Microprocesses pattern consists in:

Create a process to divide a long-running process into small Microprocesses
Queue all the Microprocesses in a push queue
Forward the messages to your micro-service for processing
Monitor using your existing APM tools and logs

Push queues and lambda functions can be scary, so feel free to have a quick read to these post articles about them and you’ll know all you need to know beforehand

How to successfully create a push queue using SQS + lambda

Amazon SQS and Lambda integration to create push queues, process your messages, and trigger endpoints on your…

medium.com

AWS Lambda facts you wish to know before processing 2 billion lambda executions — 2021

After having executed 2 billion lambdas we are happy to share recommendation on best practices and limitations on AWS…

medium.com

Hope you like it, and please feel free to reach out with doubts and suggestions! I’d be more than happy to chat! :)

Remember to follow me and/or CreditorWatch on medium for more cool content and feel free to clap the story if you like it; therefore, it will help somebody else in the future! Thank you so much for your attention and participation.

If you have any questions, please leave a comment or ask me on my LinkedIn (https://www.linkedin.com/in/juanjo-lainez-reche/) and I’ll get back to you asap.