The T-Shaped Data Scientist
At this point, it’s pretty clear that there is still no definitive consensus on what the expression “data science” is referring to. Of course, many definitions have been suggested, but the most memorable is a variation of the tongue-in-cheek “Statistics performed in San Francisco,” which is hardly an exhaustive description. While an exact definition may still escape us, however, it is arguably clear that the term is largely referring to traditional advanced analytics, but practiced in a more modern context — the focus being on data and applications made possible by the widespread use of the Internet.
As someone who has practiced analytics for several years, I view it (and data science), as cross-disciplinary fields, combining:
> Statistics/Machine Learning/Data Mining
> Database Infrastructure
> Communication/Business Acumen
> Domain Expertise
into a skill set that will often map to something resembling:
A while back, while reading a history of McKinsey & Company, I encountered the idea of the T-shaped consultant. The general idea is that you, as an employee, should have skills shaped like the letter “T,” with a wide breadth of general knowledge, but deep expertise in one specific area. This balance allows you to truly understand not only what you bring to the table, but also where your specific skills fit into the larger ecosystem in which you work.
It occurred to me sometime later that this was an apt mental model with which to view the data science profession. For a few years, the focus (as outlined in mainstream press) was on hiring individuals who possessed deep knowledge in each of the five above categories. The flaw with that approach is fairly obvious — when it comes to this field, there are very few people who possess high-caliber deep knowledge across several verticals. For example, the data scientist who sets up your Hadoop cluster probably isn’t going to be the same person who develops your classification algorithms. Likewise, this may not be the same person who handles all of the meetings regarding where the solution being created fits into the related business objectives. That’s not to say that such individuals do not exist, it’s just that they are not abundant — in fact, the three tasks given in the above example are typically performed by people traveling along three distinct career paths.
I find it interesting that though this specific construct (of the T-shaped data scientist) has received some attention, it isn’t more widely discussed. The voices in support of hiring cross-functional teams have (fortunately) increased in volume, but it is in staffing these teams that we must look for T-shaped employees. Sure, our database architect may not be the one going through the model-building process, but if she possesses the breadth of knowledge to have a good idea of what our statistician is planning, she may be able to secure data that would have otherwise seemed unimportant. Likewise, if our statistician understands that our product manager will eventually need to explain the predictors of some outcome in detail, she may choose a logistic regression over an approach like the random forest.
The goal in assembling a data science team should not center on looking for individuals who can do it all, as few exist. It should rather focus on gathering those who together hold a portfolio of deep skills, grounded by an awareness of how those skills can work together to accomplish something great.
NB: Though I did arrive at the construct of the T-Shaped Data Scientist independently, I am definitely not the first person to have done so. Google around for the takes of different authors.