Serverless approach to Cloud based Batch Job Processor Service

6 min readJun 2, 2020

In most of the cases, large scale application has some piece of code/service which acts like a batch/job processor which typically accepts some inputs and process them to produce certain output. These kind of process are generally long running processes and should run asynchronously (in background thread). We will discuss what are the different possible solutions we have while migrating such legacy applications to cloud or implementing them from scratch.

True meaning of Serverless

Before we begin lets get our basics clear about the real meaning of Serverless. It does NOT really mean that there are no servers required to run the given jobs, servers are needed. Its just that you (as a developer user) don’t have to Provision them/Directly use them/Maintain them/Scale up-down them etc. Cloud provider will do that for you. Typical charges for serverless services are pay per usage, so we will be charged only for what we use and not for idle time which is big step towards cost optimization.

Different Possible Designs

Monolith service pattern: in this approach one can write a Web service code which exposes required APIs (preferably REST) and can start background thread for each new job. Number of request such system can accept is directly proportional to capacity of the machine (EC2 instance) on which the service is deployed and how well your service code is written to handle parallel requests. Such architecture is difficult to scale dynamically, Also its not cost optimal and reliability factor is less.
Micro service pattern: in this approach one can divide the single monolith into multiple small sate less modules (micro-services), for example we can divide above service into three different parts 1] front-end service (which accepts the batch jobs over REST APIs). 2] Job Scheduling service (which receives job request from front-end and schedules it for back-end service). 3] Job Processing back-end service (which receives job from Scheduling service and process it). This is much better design from first one. Individual modules can be scaled independently and automatically based on load plus reliability factor is improved since the ability of different modules persisting a the job request and hence data can be recovered from different points. However this design is not cost optimal as we need to keep running all three modules 24/7 running so there will be some idle time depends on the work load on system. Also maintaining such system to achieve desired elasticity and reliability is difficult job.
Serverless micro service pattern: this approach will be improvisation over second design discussed above. This will try to address the cost issue and system maintenance problem. If you observe carefully first two micro-services (mentioned above in second design) are only required when any new job request comes and that too for very short period of time (typically in microseconds), for example it requires for accepting the job request and passing on to scheduling service and then scheduling service queues that job to back-end service. That’s all their role is finished, serverless approach fits perfectly in both the cases for example AWS API Gateway + Lambda functions and SQS (queue service). Only the third service which process the actual job requires long running infrastructure (EC2/Docker container etc.).

Serverless Approach

Will explore third Serverless approach in more detail here, in this approach we will use AWS API Gateway + Lambda function to deploy front-end service which accepts the jobs over REST APIs and then put them in the queue (AWS SQS) for back-end service. API Gateway needs to be configured to proxy all API calls to Lambda function and the controller in the Lambda will route the requests to appropriate handlers. Then job request can be pushed to some queue for scheduling purpose. We can think of back-end service as docker containers or cluster of EC2 instances (depends on need of the service) which continuously polls to the queue for jobs. If one available the first free node will fetch the job, removes it from the queue (to avoid parallel processing of same job) and then start actual processing.

Architecture Overview

Here is simplistic representation of how the system design will look like.

Add alt text

Deep dive in Architecture and variations of this pattern

Now there can be lot of variations possible in above design based on the scenario and need of the application. One possible variation would be to have more intelligent scheduler service (middle one) instead of just dumb queue. Now think of a scenario where we would need more intelligent scheduler service ? when there would be different kind of job processors present. Obviously there will be separate queue per job processor which serves message to appropriate group back-end server nodes (cluster). Now in such case there should be some logic which rectifies the message type and put it in proper queue. We can easily write this logic in front-end service it self. I will tell you the problem with this design.

This design creates tight coupling between front-end service and scheduler service. meaning it has direct dependencies, in the form of different queue endpoints where job request message needs to put. Moreover if there is need to add or remove job processor then it would be difficult. Here we are trying to impose additional responsibilities on front-end service, other than its main responsibility (to accept job request and pass it to scheduler). To solve this problem we should use Inversion of Control design principle and we should delegate the responsibility to distribute the job message to appropriate queue to new micro-service. Hence we can think of one more Lambda function which receives the request from front-end service and schedules it in proper queue. Further more instead of directly calling this scheduler lambda function from front-end lambda function, we can think of including notification service which further abstracts actual Scheduler lambda dependency from front-end service. Here is elaborated final architecture diagram.

Add alt text

Some quick advantages of above architecture designs are:

Using serverless model automatically enables our front-end service to be highly scaleable. AWS can launch thousands of lambda functions concurrently (and this can be easily configurable). So as a developer no need to worry about scaling of your front-end and Scheduling service.
Individual modules of the system are loosely coupled, hence individual components can scale as per the need. For example front-end service may hit with large spike which starts many concurrent Lambdas how ever number of scheduling service lambda functions may be less. Remember one thing here, service has to be state-less.
Less maintenance effort as well as cost is needed as compared to maintaining actual services or even cluster of EC2 instances.
Less effort on enforcing the security as shared responsibility model most of the heavy lifting of handling of security of underneath infrastructure on which Lambda runs is taken care by Amazon.
In terms of availability this system does good (as most of the cloud provides offers some serious SLA) and its also easy and cost efficient to build disaster recovery mechanism in this architecture (I will talk about Disaster recovery best practices in serverless context in another blog).
Reliability can also be incorporated in this design, this might be another topic of discussion. I will add another blog for this as well.
Performance of Lambda functions may be the topic of debate and may be the only disadvantage with this serverless design, however there are ways to improve it using Lambda provisioned concurrency etc. I will talk about this as well in separate article 🙂
Last but not the least, this system offers some serious cost savings by cutting down idle time costs near to zero.

Conclusion

Software industry is rapidly moving towards cloud adoption and recent buzzword in cloud world is “Serverless”. That does not mean one should try to fit everything into serverless model you cant do that ! So make your decisions wisely, there should be reasoning for every design choice you make as a cloud architect/developer. That being said you should not pull yourself back from thinking Serverless 🙂