Hype Management for Data Science & IoT

Published in

grandcentrix

5 min readApr 24, 2019

Quite a few topics related to data science were running through the Gartner Hype Cycle in the last couple of years. For example, Big Data was already declining from the top in 2014, while deep learning, according to Gartner, is on the top of the hype cycle right now. However, our experience in the everyday work with industry partners is that there is still no common understanding of most of the terms related to data science even though they are around for years. In this blog entry, we try to give the grandcentrix view on data science, the most important terms in this context, and our approach to create value with and from data.

Let’s start by shedding some light on the terminology. First of all: What is „data science“? We understand data science as the umbrella term for all kinds of ways to create value from data. The range of possibilities of what this value can be is in fact endless. To name a few, it reaches from optimizing certain processes to predicting failures or demands, to creating completely new business models. Furthermore, for us, it is important that the creation of value is done in a scientific way. This implies that the insights generated from data are resilient and sustainable, that uncertainties on predictions are controlled, and that algorithms are not used as black boxes, but with reason and understanding. In practice, this means that goals are reached in the fastest way by building up knowledge about the given domain, visualizing data in order to understand relations among data sources and applying statistical methods based on educated assumptions.

Now you could say: You didn’t mention machine learning! That’s true, and the reason is that machine learning is only one way in the field of data science to generate insights, to enable automatic decisions and ultimately create value. Machine learning itself is a scientific discipline, which deals with the idea of teaching a machine something so that it can perform autonomous actions in a defined environment. It combines mathematics, statistics and computer science, in order to create and use algorithms (like logistic regression), models (like Neural Nets or Support Vector Machines), and learning strategies (like supervised learning or reinforcement learning). As an example, it is nowadays quite easy to teach a computer to recognize whether there is a dog or a cat on a picture, for example, by training a Neural Net with a set of labeled photos, i. e. training in a supervised way.

Another frequently used expression in the field of data science is the already mentioned „Big Data“. Having done data analyses at CERN for more than seven years, I always have to smile when it comes to the different views on whether a certain amount of data is „Big“. At CERN, especially considering the LHC experiments, data rates reach several gigabytes per second. In today’s industry, such rates are extremely rarely achieved, if at all. However, as there is in fact no clear definition of „Big Data“, it is totally fine to understand it as a concept for amounts of data that a human cannot easily access or understand anymore without the support of computers and algorithms. To us, more important than the sheer size of a dataset are its quality and its usability with regard to the goals and questions one wants to reach and answer with it.

Furthermore, there is, of course, „AI“, which originally meant artificial intelligence. However, don’t be afraid, in the industry it is not used to name a „true“ artificial intelligence, like Skynet that aims to kill humanity. Instead, only a limited intelligence is meant, sometimes implying the ability that an algorithm teaches itself during a learning process. Two of the few appropriate examples are AlphaGo and AlphaStar, developed by Google’s DeepMind team. Both were trained by playing against themselves, reaching skill levels high enough to beat the best human players in Go and Starcraft, respectively. Speaking about deep, let’s cover deep learning as well. While it is one of the more recent hype topics, it is actually just an advancement in the field of machine learning. It can be broken down to an increased complexity of certain machine learning models, like adding hidden layers to a Neural Net. Our experience is that the application of deep learning in the industry is often not the fastest way to create value, mostly due to the demand for very large amounts of data.

After introducing our view on the most important buzz words in the field of data science — how do we apply data science in the real world in order to reach the goal to create value with data? Firstly, it is important to note that our customers are at all levels of maturity in terms of understanding the relevance of data in their projects and businesses. Hence, we are very flexible in the way we approach and support our customers. In principle, we follow the Cross-industry standard process for data mining, in short CRISP-DM.

Graphical illustration of the cross-industry standard process for data mining (CRSIP-DM).

CRISP-DM defines the different steps in a data science project. It always starts with gaining business understanding and domain knowledge. But the following steps depend on whether data is already available, if the data taking just started, or if the device that might be the data source isn’t even developed yet. In case of available data sets, we can start analyzing right away, in order to understand the data, reveal hidden potential, and test out the first use cases. If there is no data available yet, we still develop the first use cases, start by testing with simulated data and think through every detail of the data taking process in order to not waste any time as soon as the device delivers data. The ultimate goal in any project is to ensure that the developed processing steps and algorithms are deployed in newly developed or existing environments of our customers, so that insights are becoming available to the specific end users or use cases they belong to.

To sum up: Data Science does not mean using the most complicated tools and black boxes to gain knowledge. Instead, it is important to start early, to understand the data and data sources and to improve every step of data taking continuously. We are sure that data driven and data supported business models are the future for the majority of our customers. As hype managers for data science, we are here to help and to reveal the true potential of your data in your project.

Hype Management for Data Science & IoT

Written by Ulrich Eitschberger