Awesome Background Task Processing in Python with Dramatiq
Detailed documentation of my data merging experiment (and easy-to-follow tutorial).
For the past few weeks, I’ve been working on a Credit Risk Decisioning Engine (CRDE). As its uninspiring name clearly states, the idea is for the CRDE to assess and score a loan application based on a set of policies and a machine-learning model.
However, before a loan application can be scored, it needs to assemble data from two sources: Bank and the Credit Bureau. While it’s easy enough to get data from the bank (since we own the data anyway), getting data from the Credit Bureau is slightly less straightforward.
To simplify things a little:
- For each loan application ID, there may or may not be loan application information from the Credit Bureau
- Information from the Credit Bureau comes in batches that take up to 1 to 2 hours
In order words, when a loan application is submitted, we’d have to wait for up to, say, 3 hours to see if the Credit Bureau half loan application information exists or not. If it does, we combine it with the Bank half of the loan application and then finally send it to the CRDE for assessment and scoring.
The rest of this post will detail how I implemented this data merging in Dramatiq, a background task-processing library in Python.
My main requirements were:
- Simple yet performant
- Automatic retries, backoffs, etc.
- Easy to deploy
- Works with Redis/GCP Memory Store
The user guide gives an excellent overview of what Dramatiq is all about. One thing I could identify with (and certainly gave me a chuckle) was, “If you’ve ever had to use Celery in anger, Dramatiq could be the tool for you.”
That line sold me on Dramatiq immediately. So let’s see if it lives up to its promise.
On a high level, this is what’s supposed to happen:
- A loan application is submitted via an API endpoint. It contains the information coming from the…