Machine Learning for Production — Deploying at Scale with Python

Published in

NoBroker Engineering

8 min readDec 14, 2018

There is enough hype on the internet about how cool Machine Learning is, how powerful it is, and how magical its results are. But, very little is said about how we could bring these features into use for the masses.

At NoBroker, we believe that the fruits of Machine Learning can be harvested only when we make its value available on a large scale. Organizations should not practice Machine Learning because it is cool, but because it makes sense for their business. Even though businesses are trying to incorporate Machine Learning into their practices they are failing to make it available for users in real time.

It’s time that we deploy Machine Learning, like full-fledged production applications with high throughput, reliability and availability.

About 2 years ago, at nobroker.com, we decided to set up an infrastructure for deploying Machine learning so that it would scale in the long run and would enable us to bring value to our users with our data.

As of now, we have a handful of machine learning models that are user facing and real time. Our Machine Learning application processes around 200,000 requests every day with high throughput and low latency. All this is achieved even when our production application and machine learning application are deployed in completely different setups.

We have our main application running in Java and our Machine Learning infrastructure running fully in Python. Here, I will explain how we have achieved this in terms of — the architecture, the practices and the technologies we have adopted.

When I joined NoBroker about 22 months back, things were just taking off; a startup in its early stage. Even at that stage, Data was respected very much and the company already had ML deployed in production. There were predictive models and decision APIs facing the users.

The Machine Learning deployment was separate from the production NoBroker application and was called the Wiser Project. Wiser was a node.js app spawning R scripts as child processes. This might sound very nasty, but it was ok, considering the stage of the company.

NoBroker Java app communicated to the node.js Wiser app via APIs. But things did not go well for very long. Every ML API was a network call, and blocked a jetty thread. I remember that we had a locality score calculator that made 3 network calls in R to calculate a very simple score. This R Script was in turn served via a node.js API and was called over the network. This blocked one full jetty thread. And this API was invoked for every fresh load of a property details page. This wreaked havoc at times and took down the entire site when the accumulated jetty threads exceeded the configured thread limits.

But things have grown leaps and bounds since then, and we now house a spectacular ML deployment infrastructure, all built in house. Around the same time last year, we moved our Data Science stack into Python. We re-wrote our whole Data Science architecture, established design philosophies and conventions and adopted the best practices and technologies. We call this, the Falcon Project. Falcon project now processes more than 200,000 requests every day.

Falcon is our parent Data Science project which houses many sub projects. We have a fully pythonic Machine Learning application called Cerebro, another project that houses daily analytics cron jobs, the deprecated Wiser project and a few other projects coming in. All the sub-projects are independently containerized with Docker and deployed independently. We packed all our DS dependencies into a Docker image and build our deployment containers from these images.

Here we will talk about Cerebro, since most of our recent DS releases happen under this subproject.

Architecture

Our Machine Learning Project, Cerebro houses many ML projects. Each of these ML projects is split into 2 parts — an Offline part where the data cleansing and model building happens, and an Expose part that exposes the model to a service — which is either an API or a Consumer. This doctrine is followed in each of the ML modules in Cerebro. Since Cerebro is run as a Docker container, we also created a Data Volume that houses the models. The expose part looks into the data volume for models instead of its own directories. This allows online learning within containers.

The expose part is in turn invoked by one of the Cerebro central services — this is either an API server, or a task manager or a consumer. The NoBroker application communicates with the Cerebro application via APIs and events in a Pub-Sub. We wrote a single API server and a single subscriber/consumer that invoked the right ML module. The consumer consumes events in Kafka topics and takes actions with API endpoints from NoBroker application after the event is finished processing.

This server-less kind of architecture, we believe, would be the future of Falcon. We also have a REST API server. When the APIs response creation involves external network calls and dependencies, the request responds with just an acknowledgement of receipt to the NoBroker app and puts the requests into a message queue. A distributed task worker, takes up these requests from the queue and invokes an endpoint at the NoBroker application when the task finishes. Hence, no jetty threads get accumulated and no requests are put on hold.

We plan to remove all such APIs in the future with a Pub-sub implementation. APIs that have no external dependencies, for example, those that involve a plain model invocation or a formula calculation, responds instantaneously without a task manager in the middle. Great, isn’t it?

We use Tornado web framework for our API server. Tornado has asynchronous capabilities and can scale to tens of thousands of open connections, making it ideal for Long Polling, WebSockets. It was designed to address the C10k problem. This alone was tempting enough for us to choose Tornado.

Cerebro’s consumer is a Kafka consumer that consumes records in topics from the production Kafka producer.

Cerebro’s APIs with external dependencies puts the requests in a RabbitMQmessage broker. Celery workers take up tasks from this Queue and invokes an endpoint on finishing the task. We also use Flower to monitor the workers and the tasks in Celery

We use Supervisor to run the Tornado Server, the Celery worker, the Flowermonitor, the RabbitMQ server and the Kafka Consumer. Supervisor provides a nice web-based client for monitoring, controlling, and logging the processes.

Also, we wrote a generic exception logger called Sherlock that logged exceptions into a file. We set up a cron inside the container that sends these exceptions, very well formatted and with tracebacks to the team as emails. The Subject line of the emails goes like this: “ Sherlock detected 9 exceptions in Falcon in last 24.0 hours on 2017–10–09”.

I love these exception emails. We hardly get any exceptions from Falcon these days though. My boss doesn’t like them, and so I have to fix these exceptions as and when I see them.

We recently had a project in Cerebro, called Vigilante. It is a huge project that spans around 7000 lines of python code alone. It did so much heavy lifting with so many different kinds of structured and structured data, models and algorithms. We felt it would be better to run it as an independent container from the Cerebro project. The architecture allowed us to slice out Vigilante from Cerebro, replicate a few files and deploy them independently. It took us just an evening and Vigilante was up and running. Clean, right?

Conventions

We believe in the ROR design paradigm, Conventions Over Configuration. We don’t want our developers to hate their jobs, just because they have to crack their brains making design decisions.

Within Cerebro, we make a folder for an ML module. A folder with the same name also exists in the data volume. Each module has an src folder, that houses all the offline work code — those related to data cleansing and model building. We follow the naming conventions — eda.ipynb, extract.py, transform.py, model.py, utilities.py — all self explanatory. In case we have multiple eda files, we put them in an eda folder. In case we have multiple utilities we put them in a utilities folder. We also usually have a build.py that does extract, transform and model. This build.py should be able to update the model at any time with just running it. Under a folder with the same name inside the data volume, we put the models and relevant data files.

Outside the src folder we have an expose.py, this has functions that expose the models from the data volume for the Tornado server, or the Celery worker or the Kafka consumer.

Most importantly, we have a README.md that gives a full introduction and guide to the module.

Also, each python file in Falcon starts with the Author, Date, Subject and most importantly a Tagline to express the artistic language of the Author without pythonic syntax. All modules, classes, methods and functions need to have docstrings, just like Open Source Software.

We are obedient to the BDFL and we obey the PEP

And finally, let’s not forget the python doctrine

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren’t special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one — and preferably only one — obvious way to do it.
Although that way may not be obvious at first unless you’re Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it’s a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea — let’s do more of those!

We have taken the time and effort to build this. This may not be the best yet. But I believe, evolution is the key to any masterpiece.

Machine Learning for Production — Deploying at Scale with Python

Architecture

Conventions

Written by NoBroker.com