Machine Learning in Production

From trained models to prediction servers

What you should avoid doing

Assuming you have a project where you do your model training, you could think of adding a server layer in the same project. This would be called a monolithic architecture and it’s way too mainframe-computers era. Training models and serving real-time prediction are extremely different tasks and hence should be handled by separate components. I also think that having to load all the server requirements, when you just want to tweak your model isn’t really convenient and — vice versa — having to deploy all your training code on the server side which will never be used is — wait for it — useless. Last but not least, there is a proverb that says “Don’t s**t where you eat”, so there’s that too.

What you could do

Our reference example will be a logistic regression on the classic Pima Indians Diabetes Dataset which has 8 numeric features and a binary label. The following Python code gives us train and test sets.

Model coefficients transfer approach

After we split the data we can train our LogReg and save its coefficients in a json file.

Figure 2. The 387301st release of a prediction server (yeah, I’m exaggerating) due to a simple change in the feature engineering which doesn’t impact how the server works. Not good.

PMML approach

Another solution is to use a library or a standard that lets you describe your model along with the preprocessing steps. In fact there is PMML which is a standardisation for ML pipeline description based on an XML format. It provides a way to describe predictive models along with data transformation. Let’s try it !

Custom DSL/Framework approach

One thing you could do instead of PMML is building your own PMML, yes! I don’t mean a PMML clone, it could be a DSL or a framework in which you can translate what you did in the training side to the server side --> Aaand bam! Months of work, just like that. Well, it is a good solution, but unfortunately not everyone has the luxury of having enough resources to build such a thing, but if you do, it may be worth it. You could even use it to launch a platform of machine learning as a service just like How cool is that! (Speaking about ML SaaS solutions, I think that it is a promising technology and could actually solve many problems presented in this article. However, it would be always beneficial to know how to do it on your own.)

What we chose to do

Now, I want to bring your attention to one thing in common between the previously discussed methods: They all treat the predictive model as a “configuration”. Instead we could consider it as a “standalone program” or a black box that has everything it needs to run and that is easily transferable. (cf figure 3)

Figure 3. Top: Model description transfer approach. The server loads the config and uses it to create the model. Bottom: Black box transfer approach. The server loads the standalone model itself.

The black box approach

In order to transfer your trained model along with its preprocessing steps as an encapsulated entity to your server, you will need what we call serialization or marshalling which is the process of transforming an object to a data format suitable for storage or transmission. You should be able to put anything you want in this black box and you will end up with an object that accepts raw input and outputs the prediction. (cf figure 4)

Figure 4. Standalone trained model ready to be integrated transparently in the server side.
  1. Avoid using lambdas because generally they are not easy to serialize. While Dill is able to serialize lambdas, the standard Pickle lib cannot. You could say that you can use Dill then. This is true, but beware! Some components in Scikit-learn use the standard Pickle for parallelisation like GridSearchCV. So what you want to parellilze should be not only “dillable” but also “picklable”. Here is an example of how to avoid using lambdas: Say that instead of is_adult you have def is_bigger_than(x, threshold): return x > threshold. In the DatafameMapper you want to apply x -> is_bigger_than(x, 18) to the column “age”. So, instead of doing: FunctionTransformer(lambda x: is_bigger_than(x, 18))) you could write FunctionTransformer(partial(is_bigger_than, threshold=18)) Voilà !
  2. When you are stuck don’t hesitate to try different pickling libraries, and remember, everything has a solution. However, when you are really stuck, ping-pong or foosball could really help.

The demo

For the demo I will try to write a clean version of the above scripts. We will use Sklearn and Pandas for the training part and Flask for the server part. We will also use a parallelised GridSearchCV for our pipeline. Without more delay, here is the demo repo. There are two packages, the first simulates the training environment and the second simulates the server environment.

Contentsquare Engineering

Stories from the people building Contentsquare

Amine Baatout

Written by

Contentsquare Engineering

Stories from the people building Contentsquare