There is a popular belief amidst data enthusiasts that data science is about the tool. While being a data scientist involves knowledge of the tools, that is not all that there is to it.
The data science community is slowly becoming a tool-centered one rather than value-centered. Tool centered because we are more concerned with the sophistication of a tool used in data science that we forget the tool is a means towards driving value.
There is a delicate balance that must be maintained such that, as data enthusiasts learn a tool, they must also understand the tool is what helps drive value i.e a means rather than an end in itself.
This is not to say that learning a tool is not an integral part of the field, however, it goes deeper. If asked why you think you need any of these tools, most answers will point to the simplicity, sophistication or the appealing presentations.
The concept of data science is founded on the premise that there is a wealth of possibilities available in the massive chunks of data available. Wealth, capable of driving the decision making processes of an organization.
Data isn’t a new concept, in fact, it has been around for a long while. However, in today’s digital world, data is increasingly becoming a key measure to determine companies decisions.
The increase in cheap technologies for data collection, better computational power has facilitated the availability of data. Data from IoT devices are one of the driving forces behind the shift to the data economy.
As the number of “things” become more instrumented, interconnected, and intelligent, data will grow exponentially. Every possible business metric is now being measured; consumer behaviors, user downloads, location, browser history (digital footprints) etc.
All this information is available and ready to be leveraged for the benefit of an organization. It is increasingly becoming clearer that the real value of any data science professional is not in their mastery with the tool, but ultimately in their ability to help people/organizations find insights in their data and drive business processes.
Why Domain Knowledge is Important
Anyone with skill in any analytical tool can get hold of a dataset and perhaps through guided questions find insights. The data professionals added advantage comes from his/her domain knowledge.
Domain knowledge refers to knowledge of a specific, specialized discipline or field, in contrast to general knowledge. Domain knowledge is very important as it guides a lot of processes in the data science pipeline.
For example, a data analyst posed with a database, how does he/she know which variables will most likely be key metrics? What guides the data retrieval processes?
In terms of feature engineering for data models, how does the analyst know possible variables capable of increasing the model? Domain knowledge is the major player that guides your result interpretation from an Exploratory Data Analysis (EDA).
Take for instance, you learn how to classify dogs and cats using a deep learning framework of some sort, Great! But how does that model translate into the real world?
Another example is a Kaggle competition where you’ve had about 20 submissions to climb up the leadership. A lot of trial and error to get you up top, but in a real-world setting, considering the volatility of the business enterprise, you do not have that luxury.
These are a few cases that point to the fact that data science is not just about the tool but the ability to derive actionable insights from any kind of data.
We could say the bias of the community in amplifying the means (analytical tools) rather than the end goal (data-driven decision-making), is a major player in this deficiency.
Domain knowledge can be picked up while on the job and isn’t much of a difficult thing either but neglecting it may be considered utter irresponsibility. As a data professional, you’re trained to draw insights, and that you must do.