About the Project
Thanks to the increasing digitalization of contemporary organizations, event data about the execution of processes are continuously collected. Process mining, the frontier of process intelligence, transforms these data into insights on how processes are executed in reality. Solid process mining techniques have been produced to handle the automated discovery of process models from event data, conformance checking of event data to detect deviations between the expected and the observed behavior, and predictive monitoring to predict what will likely happen.
In spite of the reported effectiveness of these approaches for process intelligence, the whole process mining lifecycle suffers from two main issues. First, it lacks documentation and traceability, due to the heterogeneity of its steps, the presence of ad-hoc procedures, and the usage of black-box components that do not provide interpretable insights into the produced results. Second, process mining pipelines do not integrate and use domain knowledge to empower learning and inference algorithms towards meaningful results.
Building on recent advancements in explainable artificial intelligence and multi-perspective declarative languages and techniques, PINPOINT aims at developing a full-fledged set of techniques towards explainable, knowledge-aware process intelligence. This is instrumental to create auditable, verifiable, trustworthy process mining results, and in turn make them actionable.
Technically, the project focuses on two major, intertwined research threads. The first thread is about empowering process mining techniques with background knowledge reflecting multiple process perspectives at once (data, control flow, and uncertainty). The goal is to inform and guide process mining tasks so as to improve quality, effectiveness, and interpretability of their output. The second thread is about making process mining pipelines explicit, transparent and, in turn, explainable. This involves the explicit tracking of all transformation steps: from the processing of raw, low-level input sources, to their conversion into compound, high-level event data, till the generation of process intelligence outputs. At the same time, it concerns the extraction of interpretable, relevant process knowledge components useful to explain why a certain result has been produced by a process mining task – for instance, why a specific sequencing has been discovered, why a prediction has been generated by a black-box monitor, what is the root-cause for a detected non-conformance.
To fulfil these objectives, PINPOINT builds on the integrated expertise of the project units, bringing together AI, BPM, data management, and process mining. Results will be validated from a formal-theoretical perspective and experimented on real-world data from the customer care and logistics domains.
State of the Art
We briefly survey the state-of-the-art in process mining and data-driven process intelligence, focusing in particular on declarative, constraint-based languages to specify process knowledge, and the main process mining tasks relevant for the project.
Process mining is a collection of process intelligence techniques combining model-based and data-oriented analysis to obtain insights about the execution of business processes in reality. Several process mining tasks exist to discover processes from event data, check conformance of actual with expected behaviours, provide runtime operational support, and enhance models with insights from data. Our main focus is on declarative process mining, where processes are declaratively represented using temporal rules and their extensions.
Multi-perspective declarative processes
Declarative process models implicitly characterise all acceptable courses of execution as the traces that satisfy a given set of constraints. A prominent representative approach in this spectrum is Declare [38]. While Declare models have traditionally only focused on the temporal, control-flow dimension, recent efforts consider multi-perspective models incorporating additional key dimensions such as data and structural constraints and uncertainty. To properly formalise and use such multiperspective models, knowledge representation formalisms and their corresponding reasoning techniques have to be suitably selected and adjusted. Natural candidates towards this end are logical languages capturing structure, time, and uncertainty, as well as logic programming formalisms such as ASP with extensions tailored to dynamic systems.
Process mining tasks
Data extraction and preparation is the first step in any process mining pipeline, transforming raw data in heterogeneous formats and legacy data sources into high-level event data that can be processed by miningtechniques. It is a critical phase for the quality of the entire pipeline and it is typically carried out with manual, ad-hoc techniques and handcrafted extract-transform-load procedures. Research in this subfield isconsequently very active, with proposals covering conceptual methodologies, pattern-based extraction and transformation tools, matching techniques between low-level events and business-level activities, as well as ontology-based data access and semantic technologies.
Process discovery refers to the extraction of process knowledge from event logs. To that end, a large number of research endeavours have focussed on the end-to-end models in the form of Petri nets and other control-flow models. Over the last years, the inference of process constraints specifying the rules governing the process behavior emerged. Several techniques were studied towards achieving this goal, such as those based on inductive logic reasoning, automata replay, and statistical analysis. In addition, online and a-posteriori improvement techniques have been also incorporated, for example to deal with key issues such as consistency and redundancy resolution.
Conformance checking examines abstract representations of processes and their extensive form to compare how the process is understood with how the process unfolds. Its input is a process model and an event log, and its output is twofold: (i) the extent to which the representations match one another through quality measures such as fitness, recall and precision, and (ii) an artefact evidencing the measures. These artefacts include alignments and behavioural footprints. Recently, specific techniques for checking conformance of event logs and declarative processes have been proposed, exploiting automata theory and probabilistic temporal logics.
Predictive (process) Monitoring forecasts properties (e.g., future events, outcomes, performances) of ongoing process instances, based on the events accumulated so far within the instance as well as past, completed executions. This topical field is dominated by induction and usage of ensemble/deep models, suffering from limited interpretability. Recent proposals have tried to overcome this issue by mainly leveraging “post-hoc” local explanations. Despite using background knowledge in learning/prediction tasks is presumed to improve their results [26,17], this avenue is underexplored in predictive monitoring, where only simple forms of temporal control-flow constraints have been used for suffix prediction.