Data-Driven Decisions

Jesper Slik
Pon.Tech.Talk
Published in
8 min readNov 25, 2021

Why do we need data? And how can we take advantage of it?

Each minute in 2020, over 250,000 online meetings were held, more than 500 hours of video were uploaded, and USD 1M was spent online [1]. At the end of the year, approximately 64 ZB of data were created [2]. While you were reading this blog so far, more than 21 PB of data have been generated. This is roughly equal to the storage of 21,000 high-end laptops*. Despite these being estimates, a large amount of data is being generated. Data is becoming a major part of our lives, and continued growth is expected. But why do we need such a vast amount of data? And how can we take advantage of it?

Data is the oil of the 21st century, according to various experts around the world [4,5,6]. It promises to support any organization in making better decisions. Thus, its applications are not bound to a specific sector. Examples are: cancer detection, supply chain optimization, vehicular automation, churn prediction, or speech recognition. These seem to have little in common, however, the underlying technologies are similar. Combining all applications, it is estimated the big data and business analytics market will be valued at USD 274 billion by 2022 [7].

Data is a resource, like oil, which does little on its own. It needs to be converted to information, knowledge, or wisdom to deliver value. Some data is easy to process. For example, the data generated by a motion sensor can directly be used to turn on a bathroom light. Other data is difficult to interpret. For example, should an autonomous vehicle brake when it observes a nearby pedestrian? When the pedestrian is expected to cross paths with the vehicle, it should. But no sensor exists which directly predicts human behavior. Thus, additional data processing is required. Through recent developments in the fields of mathematics and computer science, nowadays we can interpret more data. More in terms of all V’s of big data: volume, variety, velocity, and veracity.

To appropriately convert data, different methods suit different scenarios. A common approach is to classify methods on their purpose. A distinction can be made between descriptive, predictive, and prescriptive analytics. Descriptive analytics attempts to describe what happened in the past. For example, computing financial metrics to describe how well an organization performed. Predictive analytics attempts to predict what is going to happen. For example, predicting the sales volume of any product for the upcoming year. Prescriptive analytics attempts to prescribe what action to execute. For example, prescribing that a company should produce more bicycles, as this market segment is expected to grow.

Despite its growth and recent technological developments, data remains a challenge for decision-makers. Unlike oil, data is practically infinite, reusable, and becoming increasingly available. Additionally, substantial investment is often required to generate data, before its value is certain or even recognized. The main challenge lies in identifying decisions and designing methodology for direct support. Auxiliary challenges include data integration, analytical skills, security and privacy, infrastructure, and synchronization[8].

In this blog, we highlight examples of analyses that take advantage of diverse datasets. Most of them support real-life decisions within Pon through their implementation. The analyses are part of my Ph.D. thesis and are, or will be, published in various journals. The examples are classified into three categories: descriptive, predictive, and prescriptive analyses.

Descriptive analyses

The first category within the analytics framework is descriptive analytics. As stated in the previous section, it concerns describing what happened. Typically, this consists of registering, storing, and presenting data as accurately as possible. Most of these tasks are executed by traditional business intelligence departments. However, sometimes it is challenging to describe what happened and the raw data needs to be processed. This could be caused by data quality issues such as missing data or human mistakes. Additionally, the raw data format might not be insightful or answer any questions.

We introduce three descriptive analyses. The first concerns detecting outliers in univariate time series, the second understanding human mobility choices, and the third measuring the global effect of Covid-19 on mobility.

Detection of Additive Outliers in Univariate Time Series
Describing what happened is straightforward if the collected data perfectly describes so. However, this is often not what happens in practice. The inspiration for this research is keeping track of inventory levels across stores scattered throughout the Netherlands. Each store reports daily inventory levels. This generates a one-dimensional series ordered in time. However, through system failures and communication issues, data might not be reported or sent in duplicate. This generates so-called additive outliers. We aim to detect these and improve data quality by resolving issues they create.

Understanding human mobility for data-driven policymaking
An advantage of collaboration with the industry is having access to unique datasets. For this research, we were granted access to a rich dataset containing mobility transactions. These contain daily choices made by individuals with respect to their travel behavior by car or public transport. Combining this with publicly available data related to congestion, we present a unique view on human mobility behavior.

On the Relation between Covid-19, Mobility, and the Stock Market
My dissertation was written in turbulent times, as a result of the worldwide Covid-19 outbreak. This outbreak has consequences on many aspects of human behavior, including mobility usage. In this research, we describe the effects of the Covid-19 outbreak on mobility usage and the impact of Covid-19 measures undertaken. We measure mobility usage globally in terms of vessels, flights, and vehicle activity combined with train and bicycle online search behavior.

Predictive analyses

The second category within the analytics framework is predictive analytics. These analyses build on the descriptive analyses and go one step further. After describing what happened, they aim to predict what is going to happen. This requires a different methodology and a slightly different point of view. A major challenge in any predictive analysis is balancing historic performance (train), and future performance (test). Often, it is relatively easy to gain a high train performance through overfitting. However, a proper predictive analysis balances train and test performance and generates reliable results.

In this section, we introduce three predictive analyses. The first concerns predicting the travel behavior of individuals. The second aims to predict the effects of changing physical store locations on sales volumes. The third analysis aims to forecast future sales.

Predicting Travel Behavior by Analyzing Mobility Transactions
After describing human mobility behavior in the second analysis, we extend this research toward predicting this behavior in the future. Their choices are analyzed on an aggregated level to take privacy into account. The resulting model can be used to help human decision-making, by proposing the right mobility types for any requested travel plan.

Optimal Store Placement through Predicting Network Effects
Predicting human behavior can improve a wide range of business decisions. Extending our mobility-related analysis, we aim to predict the effects of physical store placement on sales volume. A balance must be struck between placing stores and reaching individuals swiftly. Placing too many stores results in a large operational cost, however, placing too few stores results in lost sales. We achieve a balance by thoroughly analyzing travel duration, predicting the willingness to travel, and predicting network effects such as cannibalism between stores.

Robust Forecasting through Hierarchical Clustering
A common use case for predictive analysis is sales forecasting. Any organization dependent on product sales would benefit tremendously from knowing what their customers require in the coming weeks, months, or years. Various decisions can be improved by knowing what is going to happen. However, the future is uncertain. And this uncertainty might vary across sectors or product groups. Sales forecasting aims to predict future sales as accurately as possible through finding and extrapolating patterns in historic sales data. We contribute to this field by finding robust seasonal patterns through applying hierarchical clustering.

Prescriptive analyses

The third category within the analytics framework is prescriptive analytics. These analyses build on the predictive analyses and again go one step further. After predicting what will happen, they aim to prescribe what to do. Thus, a prescriptive analysis partly contains a descriptive and predictive analysis. Converting the predictions to actions also requires a slightly different point of view. Interesting challenges arise, such as the exploration-exploitation trade-off. It seems tempting to fully exploit current knowledge and only execute the best action. However, it might be better to explore other actions and learn from their consequences. In the long run, a well-balanced approach will find the best possible action and adapt to a changing environment.

In this section, we introduce two prescriptive analyses. The first concerns prescribing which email to send to which person at what time. The second concerns prescribing to a robot which actions to take to perform a task.

Approximate Dynamic Programming for Optimal Direct Marketing
As illustrated in the first paragraph of this blog, a large part of communication nowadays is digital. Especially for corporations, email has grown to become an important channel. Interestingly, any email can be tracked by using so-called tracker pixels. By doing so, the company can measure whether an email was opened, interacted with, or whether it resulted in an online purchase. This data is highly suitable for analysis. We aim to improve email marketing effectiveness through prescribing which email to send next on an individual basis. Based on the historic behavior of each user, we predict its interest in various email types and subsequently prescribe which email to send next.

Benefits of Social Learning in Physical Robots
The final analysis in my dissertation is unique, as the algorithm’s actions are directly executed in real life. It concerns controlling a robot in a protected environment. The robot has to execute a task, however, it needs to learn how to do so. It does so by ‘trying’ different actions and learning which actions are useful in which situation. This behavior is learned and stored in a neural network, which is evolved over generations through an evolutionary algorithm. We combine the experience of a single robot with others, such that they can learn socially and in parallel.

In conclusion, this blog shows examples of methodology to interpret diverse data. These methodologies enable organizations such as Pon to drive decisions based on data. Curiously, the timing of this research is opposite to the implied timing of this framework. One would expect to start with descriptive analyses and end with prescriptive ones. However, there exists a gap between research and practice. University courses typically focus on predictive and prescriptive analyses. Throughout the development of my thesis, however, I observed plenty of descriptive challenges exist in the industry.

This article is part of the Ph.D. dissertation of Jesper Slik, to be defended in 2022.

*A laptop having 1TB of storage; with an average reading speed of 238 words per minute [3]

--

--