Process Mining with Python tutorial: A healthcare application — Part 2

c3d3
Wonderful World of Data Science
4 min readAug 4, 2020

--

This article is the second of a tutorial series made up of the following parts:

  • Part 1: Introduction to process mining, data preprocessing and initial data exploration.
  • Part 2 (this article): Primer on process discovery using the PM4Py (Python) library to apply the Alpha Miner algorithm.
  • Part 3: Other process discovery algorithms and model representations.
  • Part 4: More holistic models which integrate control flow, time (e.g. bottlenecks, wait times), resources (e.g. personnel capacity and performance, inter-personnel relationships, department/ward capacity and performance), case attributes (e.g. patient demographics, clinical condition).

You can find the complete source code and data for this tutorial series here. In Part 1 of the tutorial, you saw how to prepare and explore your data to get a high level ‘feel’ for the processes. In this part you will learn how to use the pm4py library to discover process models by working with an example dataset (the same dataset as used to illustrate exploration in Part 1).

Formatting the event log for PM4Py

The pm4py library works with both CSV/standard pandas dataframe and XES formats. XES is a standard format used to store event logs, but it is not adopted in all contexts. The two relevant objects are converter from pm4py.objects.conversion.log (here aliased log_converter), which…

--

--

c3d3
Wonderful World of Data Science

C3D3 is about curiosity, complexity, computation, design, description and data