15 Things You Should Know Before Building Your Model In A Production Environment.

Published in

Analytics Vidhya

10 min readMay 11, 2019

A Data scientist in a Productive environment

Introduction :

The world of building machine leaning models can be so overwhelming — cracking problems, analyzing data and fitting models,and an end to end process that needs to be managed so properly in order to make the most of it, so at the end , you will be having something that is production ready.

This article is about highlighting 15 key aspects regarding the process that every data scientist needs to keep in mind while building their models in a production environment.

1 — It is all about generalization:

The goal of building any predictive model is to make use of it in a production environment, where new data is generated by the second, and your model is expected to do well on those new samples of unseen data.

So be aware of the curse of Over-fitting. Keeping this in mind will help you build a perfect cross validation scheme in order to measure the real performance of your model. No one wants to spend 2 months on building a model that eventually will fail when it comes to production, so spend some time on preparing a good validation environment to score the performance of your model before starting to fit.

2— The 3 Judges:

Every fitted model needs to be judged. In general when we speak about measuring the performance of a model we speak about 3 things :

Cross validation Score: This is the score reported while you fit your model, so make sure to have a good cross validation scheme.
In production Score: This is the score we compute once we put the model on production (beta) and score it on live data.
Interpretability: The ability to explain model’s prediction are very crucial in certain areas, like banking and insurance and others. So if the model is too complex for such explanations, it may cause your model to step down from production.

These are the 3 scores that assess if a model will survive in a production environment or not (depends on the business, sometimes interpretability can be surpassed )

3 — Feature engineering is the Key:

At the end of the day, some machine learning projects succeed and some fail. What makes the difference? Easily the most important factor is the features used.

If you have many independent features that each correlate well with the class, learning is easy. On the other hand, if the class is a very complex function of the features, you may not be able to learn it. Often, the raw data is not in a form that is amenable to learning, but you can construct features from it that are.

This is typically where most of the effort in a machine learning project goes. It is often also one of the most interesting parts, where intuition, creativity and “black art” are as important as the technical stuff.

4 — Machine learning is not a one-shot process:

Machine learning is not a one-shot process of gathering your data, cleaning it, processing it, analysing it and running a model on top of it. Rather, it is an iterative process of fitting the model, getting your results, analysing those results, modifying the data(or getting more data), modifying the model, and and repeating the whole process all over again.

Fitting the model is often the quickest and easiest part of this, but that’s because we’ve already mastered it pretty well! Getting the data, preparing it and Feature engineering on top of it is often more difficult because it’s domain-specific, while fitting a model can be largely general-purpose(you hit the same button, which is fit ).

5—There is Over-fitting but there is also Under-fitting:

In simple words :

Overfitting means that your model is starting to learn from the noise in the training data(improve the training score but do not improve/hurt the validation score)
Underfitting means your model is not able to learn/approximate/model the relationship between your input features and your target feature(if there is any).

It is easy to visualize Overfitting(just compare your train score vs your validation score), but is it the same for Underfitting. In simple words, Underfitting is when your model is not learning, and the reason behind it are one of 2 things:

Your model is too simple, you need to increase complexity(add more features, increase the number of layer in case of NNs, increase the max depth in case of Xgboost…) (if any of the above works, then congrats you where Underfitting)
There is just nothing to learn in the data that you have, so your solution is to change your data (keep the target, but change the features)

6 — Looking for the best parameters/architecture to your model ? The secret is Over-fitting plus regularization:

When fitting your models, it can be time consuming(sometimes even tricky) to find the best parameters/architecture to your models.

An approach — based on my personal experience: In order to speed this process that works most of the time, is to play to over-fit your train data (increase the complexity). After that you can work on regularizing your model. Examples — (L1,L2, dropouts, min_child_weight, lambda, gamma ….,decrease a bit the number of layers, decrease a bit the max depth….).

7 — The objective function is only a proxy:

This is another thing to keep in mind while modeling a target feature, the function your model learns is just an estimation of the real target, so we may not need to fully optimize it in order to get the perfect model into production.

8 — Always start by having a Benchmark:

To be Straight : Let us say you have gathered all the data you need and you did a first stage EDA (Extreme Data Analysis). Now you are so excited to fit that first model, it can always be overwhelming. You start crafting features and ideas that you think will work and end up with like 20 new features added to your train set. You then fit that first model on top of those features and cross validate to get a score. At this point, I would ask you a question:

What is the value added by those 20 new features ?
The simple answer is this, you don’t know, cause you have no benchmark.

Always fit a basic model on top of the brute data that you have (no feature engineering) and make sure to optimize it. At that point you have an idea about the value and gain added to your model from everything new you add to your train set.

9 — Understand your noise:

Given a data set, we always expect that similar examples should have similar classes. Anything that does not respect that we label it as noise, and based on this assumption we move on to define the noise in our data.

Understanding your noise means to understand what causes those 2 similar data points to have 2 different classes and consequently, means understanding why your model is not performing well.

In general, the detected noise can be the cause of one of these 2 reasons:

Your space of features is not enough to define similar points: This means those 2 points are actually different, but the difference appears when tou include more features (not feature engineering but actually new features)
The second reason is what we always assume, that there is a mistake in the data and should be cleaned out.

So ask around, understand your business and what situation are you facing, and handle your data correctly and don’t expect your model to do all the work for you (although sometimes it does).

10 — Understand your metric:

Every model we fit is a business model that is meant to solve a business problem. In order to know how good our model in solving that business problem is by measuring it based on the business problem metric, it can be accuracy, it can be the AUC, it can be the RMSE, or it can be something else, a custom metric that we need to build on our own.

Understanding and knowing your metric is important — Asimple example is you won’t fit your model to optimize accuracy if your business metric is the AUC. So make sure to define and understand your business metric before starting throwing data into your model.

11— Be lean in your work:

In a production world, there is noting worse than trying to re-invent the wheel and getting stuck at it.

A healthy building strategy is to start simple, be accurate in your development, and only focus on results. In a production environment, creativity comes later, first try the obvious simple solution, benchmark them, deploy them, measure them, interpret them, then be creative with a focus to improve those results. It is all about being Lean and don’t re-invent the wheel.

12 — Be prepared to be rejected:

Not every model that you will build is meant to have a production phase and make an impact, and even with that, not every deployed model is actually being used.

Machine learning is an art and a data scientist building a model is like an artist building a piece of art. Therefore, it is normal to have those good impressions about that piece of art, and the good values it holds and how impactful it can be on the world out there(the business ). However, the validation of that impact is not on the data scientist side alone, it includes also the business people that will interact with that piece of art, so it can go wrong and it gets rejected. Some of the reasons behind that rejection can be :

Your model failed the production test(could be because of the results (Over-fitting ), or the scale, etc)
Your model is too complex(interpretability is a key factor for a successful deployed model in some businesses )
Your model is deployed but is never being used (depends on the business but this could happen)

13 — Validation score is important, so is the train score:

Train score is the score that we get by evaluating the model on the data it was trained on.
Validation score is the score we get by evaluating the model on the validation set (the holdout set)

Inspecting both the train score and the validation score is important, it gives signals about over-fitting and under-fitting and when early stop the model.

A model with 99% accuracy train score and 71% validation score is worse than a 80% accuracy score on train set and 70% validation score. This implies that the model lost most likely 18% of noise accuracy to gain a 1% generalization accuracy)
You can tell you are overfitting by inspecting the training score vs. validation score
You can tell you are underfitting by visualising the training score vs. the validation score.

14 — There is EDA(Extreme Data analysis ) and there is a waste of time EDA:

As a data scientist, in production you are going be asked(or you are doing it anyway) to prepare reports, reports that speak about the health of the business, and the objective is to bring Insight, true Insight.

So, what is insight?
Insight, in simple English is a graph, chart, or a statistical info that has 2 properties:
1 — It brings something new.
2 — There is value in knowing it.

Keep these 2 properties in mind when finalizing your reports and you will notice that most of the time you will be deleting up to 50% of the things you have prepared for that report. Remember that too much info kills the info, so stay accurate and focus on what brings value.

15 — Always scale your deployments:

Moving models into production is one of the stages that every data scientist should Master, and within this stage there is scaling. Your deployed system needs to scale at the business level, if your model is going to receive 1000 prediction requests per-second, your deployed system needs to have the ability to handle it, so scaling is very important.

These were my 15 key aspects for a productive Data scientist; if you find it insightful make sure to follow me on Medium to hear more.

About Me

I am a Principal Data Scientist @ Clever Ecommerce Inc, we help businesses to Create and manage there Google Ads campaigns with a powerful technology based on Artificial Intelligence.

you can reach out to me on Linked In or gmail: errabia.oussama@gmail.com.

Reference

A Few Useful Things to Know about Machine Learning, Pedro Domingoshttps://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf