A “commercial” Data Scientist life for the business by a Data Scientist… is valuation!

Laurae: This post is about how someone excelling at machine learning should transition to a business environment. It includes an extra post by inversion which summarizes an essential skill you need to master: going through the learning curve. The post was originally at Kaggle.

DrNuke wrote:
Here again. I’m having chats with employers and headhunters in London. They want “commercial awareness” in order to be considered for jobs, not only technical proficiency from courses or competitions.

Valuating what you can bring to a company depending on the elements you are given is extremely important. It is probably the most important non-data science skill they will look for. If you show models in your portfolio that have practically no value (e.g a frankenstein model made of 5000 models), they will not care much about that (unless they are looking for someone who can make a massive model for the highest possible performance because they need that). It can show at most you have the technical skills for a specific part of CRISP-DM (but you have overlooked the very essential for a business).

Example: the business you work in wants you to make a predictive model to raise the purchase rate on a e-commerce website. All the parts may not be done in the following order:

Project part, business model, problem definition, and decision knowledge:

  • You need to know why you have to do that: what intrinsic value does this idea has? What extrapolated valuation does it has? Do you reach a breakeven point depending on the potential cost (both fixed+variable) and a potentially given increased purchased rate? (this is where it is mandatory to talk about the minimal objectives and requirements)
  • Given all the previous gathered conditions, what are the benchmarks? Are you looking for high value low performance, or low value high performance? In businesses, high value is more important than low value. Getting out with low performance is most of the times better than high performance, if the value given back is much higher (value > performance).
  • Given the previous benchmarks, what is the cost matrix (or any appropriate cost measurement)? ex: the confusion matrix (mis)classification costs (for accuracy)? Are you able to quantify the value of the model?

Modelling part:

  • Given the previous matrix cost, what data do you need to gather? Why?
  • Given the known data you need, how do you gather them?
  • etc, the typical CRISP-DM.

Valuation comes automatically if you have the business knowledge, which may be what you are currently lacking.

In France, I found out for international employers if they knows your technical skills already (“why employers are still using blackboards to assess that…”), what they will want to know is your business knowledge. This comes along with business understanding, “sort-of” project management, accurate business problem definition, the company… This may (should) be also the case in other countries.

I host recruitment preparation workshops for Data Science, and it’s awful to go at websites like Indeed and have to cross out away 95% of job postings for Data Science because employers are looking for Engineers, not Data Scientists (but too bad, recruiters have budget and if they can have for cheap 1-(wo)man doing 2-(wo)man work (Data Science + Engineering/Developer), they will take 1-(wo)man). At least 100% of my participants get a very good placement and they are all very happy of their work (except for DATA CLEANING/GATHERING which is awful for most of them). I don’t imagine a real pure Data Scientist being put in a pure Engineering/Developer job with the title “Data Scientist”, this does not make sense.

Extra note: some companies will not let you use their computers to train models overnight (for obvious reasons).


inversion wrote a nice post about that question also.

inversion wrote:
Hiring managers use (often stupid) heuristics when considering candidates. Data Science and Machine Learning are changing so rapidly, finding someone that can “do XYZ technology” is short sighted. They should be looking for someone who learns whatever they need to in order to get the job done.
Always be prepared with examples of smashing the learning curve and make it clear it doesn’t matter what gets thrown out at you.
One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.