More Than a Buzzword: Driving Business Value with Data Science
This is the first in a series of contributions from Wizeline’s very own Data Science team! Learn more about the authors below, and be sure to check back every other week for posts that aim to demystify the complex, fascinating field of Data Science.
Data Science is the science of learning from data, with all that this entails. This definition from renowned mathematician David Donoho — while broad — is spot on.
At a high level, Data Science (DS) is the combined approaches to taking sets of data, analyzing them and enabling people to learn from the accrued insights. There are different meanings to this “learning”, such as using descriptive methods to understand the past, and using other methods to predict what will happen (think the Netflix recommendation algorithms).
DS has become the new paradigm to drive business value, and the possibilities seem endless.
At it’s core, Data Science is a scientific-driven approach to data analysis. Sounds redundant, but it’s worth highlighting that DS is closer to a science than, say, project management or similar business endeavors. It includes the following tasks, in no particular order or importance, depending on the project:
- Data Collection: Collection or creation of data from one or various sources, e.g. manual input, online sources or a new experiment’s output
- Data Preparation: Conversion of the recorded data into something actionable and meaningful (this is also referred to as Data Munging)
- Data Exploration: Analysis of data to summarize their salient aspects, which in turn leads to hypotheses formulation
- Data Modeling: This may include prediction, forecasting, and other related algorithms falling under machine learning, deep learning, statistics, mathematics, and related fields helping us understand new observations
- Data Visualization: Tells stories about the data, specifically the outcome of analysis
Data Science is often thrown around as a catch-all buzzword. Some definitions reduce it to “a new form of business intelligence” incorporating big data, or as a rebranding of analytics, statistics, or data mining. Each of these areas of knowledge play a role in DS, but may or may not come into place within a given project. However, DS is not exclusively any one of those.
A non-exhaustive list of the most common fields within Data Science by organizations such as the NYU, Berkley, Columbia university, Amsterdam data science, and EdinburghData Science include: computer science, statistics, modeling, data analytics, mathematics, and domain knowledge. Although you may have read that big data is a characteristic of DS, it is not a requirement, since all data presents challenges regardless of its volume. It is true that for most analysis, the more observations the better, but this should be addressed on a project by project basis.
From this you can conclude that DS is not groundbreaking knowledge. In fact, DS is by no means new, as it was initially proposed 50 years ago with the same name. In the publication ‘Data Science: A technical plan for expanding the technical areas of the field of statistics’, it was conceptualized as a direct evolution of statistics with six basic areas that include computing with data, DS tools and techniques assessment, and theory to support the tools generated. Realizing that fifty years of constant efforts have passed, helps understand that this is not an easy task. Although now everyone knows about DS, this field still requires a lot of attention, knowledge, and talent to progress.
About the authors:
Ana has held Data Science and Engineering roles at Accenture and Tec de Monterrey. She earned a M.Sc. in Big Data Science from Queen Mary University of London, and her main interests are machine learning, network analysis and user behavior modeling.
Juan spent several years at (HP) Labs as a Research Assistant. He earned a M.Sc. in Statistics and Operational Research at the University of Edinburgh with distinction. He is currently a lecturer in the Industrial Engineering department of ITESM.