Exploring the intersection of BPMN with Process Mining: Insights from integrating bpmn-visualization with pm4py
Introduction
You know what they say: “you can’t improve what you can’t measure,” and that’s where process mining has a role. Process mining is a powerful set of techniques for analyzing and optimizing business processes based on data. Similar to machine learning and data analytics, process mining uses data-driven methodologies to extract insights from process execution data, also known as event data.
One of the key requirements for effective process mining is the ability to visualize and interact with the results of analysis on process models. This is especially crucial as process mining findings are often used by business experts who may not have a technical background. However, many of the available industrial process mining tools have limited support for BPMN-based visualization, despite BPMN being a widely-used standard for process modeling and execution. As a result, there is a clear need for tools and libraries that can provide more comprehensive BPMN-based visualization capabilities.
There is an evident requirement for process mining tools and libraries that can offer enhanced visualization capabilities based on BPMN (Business Process Model and Notation).
To fill this need, Process Analytics — an open-source initiative under Apache 2.0 license led by Bonitasoft — is committed to developing developer-centric BPMN-based visualization components that can be seamlessly integrated into process mining projects. A pivotal component of this initiative is the bpmn-visualization library which empowers developers to visualize the results of process mining techniques on BPMN process models. With support for both TypeScript and R, bpmn-visualization provides developers with a straightforward means to render, style, and interact with BPMN elements.
Note that the popular Python process mining library pm4py offers a range of methods for process discovery, conformance checking, and performance analysis. By integrating bpmn-visualization with pm4py, we aim to take a step towards enabling BPMN-based process mining and visualization. The source code and documentation of this integration is available on GitHub, which offers an open opportunity for testing, exploration and contribution.
In this article, we will describe our experience in integrating these two libraries, and discuss the challenges we encountered due to the limited support for BPMN-based process mining. Our goal is to provide high-level insights and takeaways for the process mining community, including researchers, process mining experts, editors, and developers. Furthermore, we aim to engage anyone interested in the fusion of BPMN with process mining, including individuals from the BPM domain or other related fields. By doing so, we hope to raise awareness about the importance of native BPMN-based process mining and its potential impact across different domains.
Integration
Architecture
The integration involves the development of two main components — the frontend and the backend.
The frontend component is written in JavaScript and leverages the capabilities of bpmn-visualization to visualize BPMN process models and the result of the backend analysis.
The backend component is built using pm4py and is responsible for processing data and performing process mining analysis. More details about the architecture, the technologies used, and the application setup are available on GitHub.
Use cases
Three process mining techniques are used in this integration example: process discovery, conformance checking, and enhancement.
Process discovery: The inductive miner algorithm is used to automatically create a process model from event data stored in an event log using the XES storage standard.
Conformance checking: The event log is compared with the BPMN process model to detect deviations or non-conformities.
Enhancement: The BPMN process model is enriched with statistics data computed from the event log, such as the frequency of execution of BPMN elements and the conformance data obtained from the conformance checking technique.
Figure 2 below shows an example of a BPMN process model discovered from a simple synthetic event log and enriched with frequency data using our developed integration example.
Approach
BPMN-based process mining is an emerging topic that is under development in the research world, and not fully exported to the industrial world. This is because most process mining techniques originate from research efforts which are based on Petri net, a formalism for modeling concurrent systems. This is true of pm4py, as most of its implemented algorithms are based on Petri nets.
BPMN-based process mining is a developing research area that has not yet fully transitioned to industry. The integration process involves converting between BPMN and Petri nets, enabling the utilization of existing process mining algorithms designed for Petri nets and visualizing the results on BPMN diagrams.
Therefore, to develop the integration, we followed the approach described in enabling BPMN-based process mining. Generally, this approach consists of converting between BPMN and Petri nets. This means that existing process mining algorithms developed for Petri nets can be used and the obtained results are then processed to be visualized on BPMN.
To give a sense of this approach, Figure 3 below illustrates the workflow of generating the visualization shown in Figure 2 above (the discovered BPMN enriched with frequency information).
The first step consists of importing an event log. In the backend, Process Discovery and Enhancement components are required to discover and compute frequency information, respectively.
The Process Discovery component first creates a Petri net from the imported event log, which is then converted to BPMN. In our integration, this conversion is performed by the BPMN variant of the inductive miner using the discover_bpmn_inductive method. It’s important to note that newer process discovery algorithms that can directly create BPMN process models are emerging (e.g. see the Model Generation Application developed as part of the Process Analytics project or the Split miner used by Apromore).
Next, a layout generation algorithm is employed to determine the positions (i.e., x and y coordinates) of elements on the BPMN diagram. In our example, we used the bpmn-layout-generators library developed as part of the Process Analytics project. Finally, this component outputs a serialized BPMN xml file that is sent to bpmn-visualization for rendering.
The Enhancement component enables the enrichment of the BPMN model with statistical data, such as frequency and performance information. To achieve this, a mapping between the event log and the process should be first established using replay algorithms. Once the mapping is established, the statistical data, such as frequency and performance data, can be computed for Petri net elements. The computed statistics then need to be mapped back to the corresponding BPMN elements. Finally, bpmn-visualization is utilized to visualize the results on top of the BPMN model.
Challenges & limitations
No native support for BPMN-based process mining
One of the main challenges we encountered is the lack of native support for BPMN-based process mining. While most process mining techniques rely on Petri nets, these can be challenging for non-technical business users to comprehend and utilize effectively. Existing industry-based solutions are primarily built on non-formal representations of process maps. Although process maps offer initial insights, they have limitations in representing the full process behavior and cannot support thorough and detailed analysis. In contrast, BPMN already has widespread acceptance within the business community, making it a more suitable and comprehensive approach for process mining.
Converting BPMN to Petri nets for BPMN-based process mining has limitations such as information loss, performance issues, and maintenance overhead.
The approach of converting BPMN to Petri nets is not a perfect solution for BPMN-based process mining due to several limitations:
Loss of information: The conversion process from BPMN to Petri nets may result in a loss of information, which can affect the accuracy of the analysis. In the BPMN-based process mining approach, a subset of BPMN elements is used to ensure a conversion with no loss.
Performance issues: The conversion process can be computationally intensive, which can lead to performance issues, especially when dealing with large process models.
Maintenance overhead: The use of Petri nets requires additional maintenance overhead, as the models need to be updated whenever changes occur in the business process.
Furthermore, the enhancement process mining technique, which involves establishing a mapping between the process model and the event log and computing various statistics, relies on replay algorithms that have been developed for Petri nets. Unfortunately, to the best of our knowledge, there is currently no known open-source implementation available for BPMN. Since event logs cannot be directly replayed against BPMN models, they must first be converted to Petri nets before the replay can be performed.
Despite these limitations, the conversion approach is currently the most viable solution for BPMN-based process mining, given the lack of native support for BPMN in the process mining community.
BPMN layout generation
Another challenge in BPMN-based process mining is generating a layout for the discovered process model. While process discovery algorithms can automatically extract a process model from event logs, they do not generate a visual layout that represents the model in a way that is easily understandable by humans. Process mining has primarily relied on existing hierarchical graph layout algorithms that do not take into account specific process properties, such as centralizing the most frequent path or the most performant path.
Recent efforts have emerged to address these issues in process maps, such as the Tracy algorithm by UIPath. Notably, there are currently no proposed layout algorithms specifically tailored for BPMN. In our bpmn-layout-generators library, we currently focus on experimenting and generating more comprehensible BPMN layouts. In the future, we will continue exploring opportunities to enhance the layouting of BPMN models, including considerations for process properties.
Conclusion & takeaways
Despite the current lack of native support for BPMN-based process mining, integrating BPMN with process mining techniques offers valuable insights that outweigh the discussed limitations.
BPMN, with its higher level of detail and formality, surpasses the limitations of process maps, which are commonly used by existing process mining tools for result visualization. At Process Analytics, we are dedicated to developing BPMN-based visualization libraries like bpmn-visualization and bpmn-layout-generators.
We envision a future, as hinted by Prof. Wil van der Aalst, where there will be a seamless transition from process maps to BPMN.
As this transition unfolds, we anticipate a growing demand for BPMN-based process mining tools capable of harnessing the full potential of this formalism, providing more accurate and comprehensive insights.
If you want to stay on top of the latest news and releases from the Process Analytics project, follow us through:
- Website: https://process-analytics.dev
- Twitter: @ProcessAnalyti1
- GitHub: https://github.com/process-analytics
- Discord: https://discord.com/invite/HafnBYsRXd
- YouTube: Process Analytics YouTube channel
Useful resources
- Official process mining website
- Getting started tutorial to bpmn-visualization
- Limitations of process maps from a practitioner’s perspective by Wil van der Aalst
- More articles about BPMN process discovery and layout generation by Process Analytics project can be found on: https://process-analytics.dev/model-generation-application/