A Reference Architecture for Responsive Asynchronous Task Execution
How to notify UI clients about the progress of long-running async tasks that goes beyond the scope of an HTTP request.
In typical API-based communication, the processing of some API requests goes beyond the scope of an HTTP request. Processing a large file, performing a calculation, or executing a workflow exceeds the regular HTTP request timeout.
To keep the UI responsive, we often carry out these tasks asynchronously. But in some cases, the UI demands a confirmation about the task’s execution status. Based on that, the UI shows a notification to the user about how the task execution went.
Usually, the backend updates a database once the task is completed. The front end keeps polling the database until it sees the task has been completed. But that type of polling is exhaustive and not fitting for a good user experience.
In this article, I’ll present you with a reference architecture that can be used to notify task clients about task completion in real-time over a WebSocket connection.
The architecture consists of the following components.
The Task client can be any HTTP client like a web or mobile application or batch process. It does two things.
- Formulate a task definition with all execution parameters and submit it by making an HTTP POST call to the Task API.
- When the Task API responds with the WebSocket endpoint URL, establish a connection to it and listen for the task completion status.
- Update the UI or perform a similar task based on the completion status.
The Task API is an HTTP endpoint that accepts task definitions from task owners for execution.
A task takes a considerable amount of time and resources for its execution. If the task definition is erroneous, it must be rejected as early as possible to save valuable CPU cycles at the backend. The Task API does this by validating the task for its payload content and business rule violations. If there are any errors, the API rejects the task by responding to the owner with HTTP 403 (bad request).
A sample task definition would look like the following:
The API also checks for an authentication token in the task request to ensure that only authorized clients submit tasks.
Once the task is authenticated and validated, the API creates a task object with a unique ID. Then it is placed in the Task queue for processing. If the placement is successful, an HTTP 202 (request has been accepted for processing) is sent back to the task client.
That response contains an endpoint URL with the generated task ID. That’s where the task client should connect and listen for any status updates on that particular task. Don’t worry about this for now. I’ll discuss that later in detail.
The following shows a sample response from the Task API:
Once accepted, the Task API enqueues the task in the task queue.
An accepted task is a liability of the system. Hence, it is put into a message queue to satisfy the following needs.
- We must never lose an accepted task.
- The task must be delivered to its processor at least once.
- A task must not be processed by more than one processor.
Technically, you can use any message broker that supports point-to-point messaging semantics. Some examples are RabbitMQ, AWS SQS, etc.
Task processor polls the task queue to pick up new tasks and processes them.
The processor carries the business logic required to process a task and can be implemented in any programming language of your choice. There can be multiple processor instances to increase the processing throughput. The message queue ensures that only one instance will process a task.
Once the processor completes a task, the outcome can be a success or failure. The important thing is to notify the task client about the completion. That is done by the processor sending a message into the status notification topic.
A typical message could look like this:
Status notification topic
This topic holds the completion status of tasks, and it enables the communication between tasks processors and the WebSocket endpoint.
You can use any publish-subscribe messaging system for this. RabbitMQ, Redis, and AWS SNS are few examples. Later, you’ll see that the WebSocket endpoint subscribes to this topic to get notified about task statuses.
The WebSocket endpoint does two things.
- Accept incoming WebSocket connections from task clients.
- Subscribes to the status notification topic to receive updates and dispatches to relevant connections.
The following figure illustrates this better.
The task client connects to this endpoint with the task ID in the URL. Then the endpoint keeps the connection in a map, keyed by the task ID.
Upon receiving a task completion notification, we can find the associated connection by querying that map with the task ID of the notification. Once found, the endpoint writes back a message back to the client connection, indicating the task processing status.
An example could be:
Flow of events
- The Task Client submits a task to the Task API and waits for a response.
- Task API accepts, validates, and enqueues the received task in the Task queue. If everything goes fine, it responds to the client with HTTP 202. Otherwise, an appropriate error is returned.
- Task Client extracts the URL of the status endpoint (the WebSocket server) from the response. Then establishes a WebSocket connection to receive updates.
- The WebSocket endpoint accepts the connection from the client. It has also subscribed to the status notification topic to receive updates.
- A Task Processor picks up the task from the Task Queue and executes it in the background. Once the execution is finished, it publishes a message to the status notification topic indicating the outcome of the processing.
- The WebSocket endpoint who had previously subscribed to that topic receives the task completion message. It then checks whether the task ID of the completed task belongs to an active connection. If yes, the endpoint writes a message back to the connection indicating the outcome.
- The Task Client receives the task response over the WebSocket channel and updates its UI accordingly. For example, it could display a notification to indicate that the submitted task has been completed.
Scaling out the architecture
There are a couple of options to scale the above architecture.
The API layer is stateless, and adding more nodes will scale it out. The same goes for task processing as well. Add more processor instances to increase the processing throughput. They will follow the competing consumer pattern to compete for incoming tasks.
The tricky part here is the WebSocket endpoint. As per my understanding, you can front a cluster of WebSocket servers with a load balancer like Nginx or HAProxy to do TCP level load balancing for WebSockets.
What could’ve done better?
Although the above describes the bare-minimal architecture, you can apply some tweaks to make it better.
Add support for polling and task auditing
Sometimes, the Task client may not be interested in getting to know the task’s status right away. It may come back after a few hours, days, or even a month. So it’s necessary to record the complete status of the task’s lifecycle in a database to allow clients to query later.
You can do that by adding a log record at each critical stage of the task, such as accepting, validating, processing, and completing. That helps anyone to retrieve these records later and diagnose how did the task processing go.
Adding support for progress bar based UIs
Sometimes, the UI expects aggressive progress on task completion. It may not wait till the task completion but expects the backend to stream the progress at critical milestones.
For example, if a large file is being processed, the processor can send a stream of events to the status notification topic such as 25% done, 40% done, etc. Then the WebSocket endpoint will relay it back to the UI to render a progress bar.
Add support for prioritizing and canceling tasks
Not all tasks are made equal — some need to have a higher priority and need to be processed soon. Consider using priority queue semantics to implement that. Perhaps, the task API can assign them priorities based on the business case.
Also, the ability to cancel a submitted task is essential.
Clients of long-running API requests often demand a real-time mechanism to get notified about the task’s status.
Clients can poll a status endpoint, but with carefully planned architecture like above, you can notify the UI over a WebSocket connection to keep the user experience intact, also keep users engaged with your application.
I’m open for your feedback on this. Please leave comment here or DM me on Twitter.