Data Science Venn Diagram V 2.0

Greg Werner
IllumiDesk

--

Drew Conway’s Data Science Venn Diagram, created in 2010, has proven to still be current. We did a reinterpretation of it with only slight updates to the terminology he first used to determine the combination of skills and expertise a Data Scientist requires.

Conway’s “Data Science Venn Diagram” characterizes Data scientists as people with a combination of skills and know-how in three categories:

1) Hacking

2) Math and statistics

3) Domain knowledge

We’ve updated this to be more specific:

1) Computer science

2) Math and statistics (no way around this one!)

3) Subject matter expertise

The difficulty in defining these skills is that the split between substance and methodology is vague, and so it is not clear how to differentiate among hackers or computer science experts, statisticians, subject matter experts, their intersections and where data science fits.

What is clear from Conway’s or this updated diagram is that a Data scientist needs to learn a lot as he aspires to become a well-rounded, competent professional.

It is important to note that each of these skills are super valuable on their own, but when combined with only one other are not data science, or at worst even hazardous.

Conway has very specific thoughts on data science that are very different on what has already been discussed on the topic. On a recent interview, he said “To me, data plus math and statistics only gets you machine learning, which is great if that is what you are interested in, but not if you are doing data science. Science is about discovery and building knowledge, which requires some motivating questions about the world and hypotheses that can be brought to data and tested with statistical methods.”

On that matter, in Conway’s original Venn diagram, he came up with a Danger Zone (which we are calling Traditional Software in order to not sound as ominous).

On that area he placed people who know enough to be dangerous, and he saw it as the most problematic area of the diagram. In this area people are perfectly capable of extracting and structuring data, but they lack any understanding of what those coefficients mean. Either through ignorance or spite, this overlap of skills gives people the ability to create what appears to be a legitimate analysis without any understanding of how they got there.

We can conclude that the Data Science Venn Diagram can help pinpoint these threatening combinations of skills and therefore, detect professionals who are not properly prepared for the challenge of doing excellent Data Science. It is indeed a great tool for understanding who could end up being great in their job, or just a threat in the road to the accomplishment of a specific task or to a company.

--

--