Process Mining, Data Science & Process Science

Published in

The Outlier by Pattern

8 min readNov 28, 2018

With no less than a four-person team of interviewers and videographers, I recently paid a visit to a highly requested interviewee: Dirk Fahland. An originally Berliner, who moved to Singapore, to eventually end up in Eindhoven where he’s has been an assistant professor in the Analytics for Information System group since 2013. In his work, he now focuses on what some describe as the intersection of data science and process science: process mining. It all started, though, in his youth where he always had an above-average interest in computers which, for example, shone through in his fascination with programming video games as a little child. Only years later, he discovered his true academic ambitions after a professor had recognised his passion for science and offered him a research assistantship. Drawn by the freedom and desire to make a tangible impact by working together with students, he obtained his PhD under supervision of Wil van der Aalst and Wolfgang Reisig and stayed at the TU/e ever since. Not surprisingly, he has always been closely involved in the field of education. As one of the founding fathers of the Data Challenges, we asked him about his vision on 21st century education, his lessons learned teaching, and of course his own research interests.

Educating the Eindhoven Engineer

One of Dirk’s main reasons to settle down in Eindhoven was the TU/e its education set-up that involves regular hands-on work. “I’m really a person who doesn’t want to do things just for the drawer or the books but make it work in practice. That doesn’t mean doing things for myself but also teaching other people to be good data scientists for that matter.” Although this idea is now very appealing to him, he admits he had to get used to it at first: “Here students really like seeing why it matters right from the start. This challenges you as a teacher quite a lot. Even for fundamental things you suddenly have to think: What’s the use of it? How can I motivate students to see applications?”

So how does he do that in practice? “I would rather spend 45 minutes discussing concepts and ideas than presenting definitions. This requires me to bring students to a level where they can actually have a substantive discussion with me. What works well, is asking students to prepare a video lecture, quiz or exercise at home. Then, in class I give them a mini-assignment that is actually too difficult for them. This may be a bit mean to students, but it forces them to start thinking about the problem and eventually get the hang of it.”

As a university that has been experimenting with video lectures and clickers since the very first hour, what’s his vision on the role of technology in modern day education system? “Technology alone won’t do it; I really need to know the subject so well that I know what I can give to students as preparation. As every teacher will tell you: you need a couple of years to get this right. The big challenge is to make it interesting enough for students to invest their time upfront.”

Process mining shines in identifying business problems and reveals when, where, and why they occurred. This way companies can spend more time on transforming actual problems, rather than endlessly looking for the needle in the haystack.

Process Mining vs Data Science

Given his experience in both disciplines, I asked him how they differ from one another: “Process mining deals with time and causality in complex dynamics; we’re not just looking at a time-series of values, but we are looking at how entire organisations move and operate. This leads to discrete events which usually follow several patterns, for example a sequence. However, an organisation is not sequentially structured; not everybody does things in sequence, things are happening concurrently. This is where process science brings in unique concepts and provides models and techniques to describe these dynamics that do not exist in classical data mining.

Modern data mining is dominated by machine learning techniques that allow you to classify or predict. In data science, we are happy to use any model that sufficiently approximates what you’d like to do. Especially with neural networks, we’re fine with these black boxes as they are. That will never work, however, for an organisation that tries to understand how its people are operating. In process mining, we need glass box models to understand where the organisation is not working well and can potentially improve. This allows us to x-ray an organization. We can see deviations or ways of working that are costly like data-entry errors, or highlight bottlenecks that delay a process like waiting for a signature from a business partner.

Further, the dynamics we investigate are extremely complex: there is never one model that will do. Depending on whom you ask within the organisation, they will have a different view on what matters to them. While algorithms often spit out a single answer, you need different answers depending on the question people have. So your solution should be multidimensional and this is what makes process mining so unique.”

*Relative* Google Search Volume comparison between **“process mining”** (blue) and **“data science”** (red)

Process mining may be a unique field, but it’s still far from data science in terms of media attention. Why is that? Dirk explains: “On the one hand, process mining addresses a problem that usually takes place in the back-offices. It doesn’t happen on the frontpage of a company. Another reason is that the field of process mining is relatively young. The first process mining papers were published in the late 90’s, and the first algorithm was invented in the early 2000’s. So we’re not even 15 years from that point. There are now a couple of start-ups that begin commercialising these concepts. The market is still being developed and not by taking customers from one another but by simply convincing new customers that it’s a useful technology. Recently, a market study found that the total market will grow into a multi-billion market in the coming years, and we’ve only just started.”

The process mining market will grow into a multi-billion market in the coming years and we’ve only just started.

People & Politics

“What makes adoption of process mining so difficult is the moment you start analysing a company’s processes through data, you start providing facts to a discussion that so far has been dominated by opinions, people’s interests, and company politics. In the past, often managers were responsible for internal process optimisation. All of a sudden, some young data scientist comes along and tells them where the problems in their processes are. In that sense, the authority of who can say whether a process is done right or wrong has changed. It’s a very slow transformation that is happening bit by bit all around the world that takes plenty of time.”

The authority of who can say whether a process is done right or wrong has changed.

Networking Effects of Processes

“In the past we didn’t have enough data for process mining. It was very difficult to convince customers to provide their data to develop new process mining techniques. With the market growing, the chance of industry projects is increasing.

You now see commercial process mining being used to analyse and improve local small-scale processes. What I personally expect to happen in the near future is that companies start looking not just at one process in isolation but how processes within a organisation relate to one another: networking effects of processes. In the longterm you want to be able to analyse entire value chains of industries to improve their efficiency.”

Responsible Data Scientist

After we had finished our discussion about process mining, we talked about a topic that’s getting more and more media attention these days: data science ethics. Dirk starts off with the obvious Volkswagen scandal to illustrate how much more complicated matters can be in data science: “If you train a model of job applications over the last 10 years to predict whether a particular application will most likely lead to a hire, and then let this model help you in this hiring process, you’re probably not doing something against the law. But what you have to realise is that there may be bias in how people have been hired in the the last few years. It was in the news yesterday that Amazon had to pull an AI program that was doing this based on training data from the last 10 years. It was biasing against women and favouring men because they were predominantly hired.

What we did wrong in the past is not in the data, it’s outside the data.

I wouldn’t know how to properly regulate this on a legal level because you can’t prescribe the neural network architecture or the validation you need to do because there is always something that doesn’t fit the criteria. It comes down to the responsibility of each individual data scientist to constantly be aware that when we try to learn from the past, we have to understand what we’ve done wrong in the past. What we did wrong in the past is not in the data; it’s outside the data. Knowing this, and embracing this, as you do data science, is what a data scientist must do. You can’t argue your way out there is no law.”

The Flipside of the Machine Learning Revolution

“What happened to Facebook — letting its algorithms decide which content to show and promote in, for example, Myanmar— is a good example of what I’m worried about. The moment you automate anything and leave it running by itself without additional human checks and balances, you leave it open to ways you never intended. If we trust sophisticated algorithms because they seem so smart that they have human like capabilities, then we’re making a mistake.

There are many situations where, after we have solved the problem, we automate it, so that we can focus on more interesting ones. This has always been the promise of the machine learning revolution. However, algorithms don’t always have an extra eye to detect once things go out of line. Everybody makes mistakes, but machine learning models can’t correct them, only humans can.”

After we had shot the article photo and waved goodbye, I walked down the aisle to the elevator to go all the way down from the seventh floor. While cautiously observing the floor indicator counting down, I tell my fellow interviewers: “As you said, that was indeed a very nice guy.” Still, somewhat puzzled why nobody responded, I turn my back once the elevator finally stops and suddenly notice a familiar face in the corner of my eye that whispers in my ear ‘have a nice day!’. Red-faced I left the elevator and concluded: even in process mining perfect timing doesn’t exist…