Questions that Data Scientists Hate Getting
This is a variation on a Quora answer.
When asked how data scientists can be effective, there are a few things that com e to mind:
- Skills: A curiosity and sufficient skill in data analysis methods and techniques
- Fundamental needs: the data and access to the tools to perform analysis — and this would include the environments
- Performance needs: Sufficient resources, time and good enough processes to validate or invalidate hypotheses and build models based on them
- Excitement needs: Sufficient support and latitude to independently deploy projects based on successful hypotheses tested and models built
Note that while these criteria listed above begin with the fundamental skills required to do data science, the focus shifts in items 2, 3 and 4, to what is required for data scientists to be effective. The first of these are the fundamental needs, such as the data itself, and the access to the required tools, be they statistical or machine learning tools, databases, visualization libraries, or other resources. The second of these are the performance needs, which will help the data scientist do whatever it is that they do, a bit better than how they’re doing this now. This includes processes and systems that enable the data scientist to improve their own capabilities. Finally, we have excitement needs, which enable data scientists to do outstanding work — a large part of this is being able to reuse what has been built, through deployment of various kinds.
It is in this context that we can discuss how managers of data science teams can help them be effective.
If there is one kind of behaviour in analytics managers that I wish changed, it is the one I describe in the following lines.
A lot of what data scientists do is experimental, throw-away analysis. However, it is tempting for a number of managers (many of who have made up their minds that some hypothesis holds true, or will work), to assume that they’re right, and what is required from the data scientist is the detailed model that formalizes the relationship.
This kind of assumption makes for poorly designed projects, and doesn’t amply use the data scientist’s time for exploratory analysis, for evaluating the development of different kinds of models, and for finding out what works, given the dataset.
Naturally, given the time-bound nature of businesses and poor understanding of analytics at the executive level in many organizations, such clients are commonplace, and such managers also find themselves in a situation where they push for results without the right underlying systems, data or resources. Sometimes, they begin projects with data scientists who lack the specific skills to build the kinds of models required to solve problems. While this may be the case, the challenge many data scientists in business and consulting have is dealing with such unreasonable expectations.
In this specific context, some questions that shouldn’t be posed to data scientists might be along the following lines:
- “Assuming that hypothesis X works, how long would it take to build a full fledged application using this hypothesis X?”
- “The domain experts are convinced that this hypothesis X is true. Why don’t your results reflect this too?”
- “The values of R_sq or precision/recall I see here don’t reflect what can be done with the data. Aren’t better results possible?”
These kinds of questions are simplistic when in the initial stages of a data science activity/experiment, and in some situations, they could be dangerous too (although they’re innocuous mistakes any manager new to analytics initiatives may make).
For the same reason that “a little knowledge is a dangerous thing” these project managers might be playing with the fortune of the entire analytics program they serve, because they base even large projects on such naive and unverified assumptions. Were they to change their behaviour by giving due consideration to exploratory data analysis, and what the data actually says about viable models and applications that may be built, they might be putting their data scientists and engineers on the path to success.