Open Topics

The Workflow Systems and Technologies Group offers topics for bachelor and master theses as well as Master Praktikum. The following list contains a number of current suggestions for topics.

To discuss one of those or any other topic in this research field, please contact the particulary named supervisor.

» Jump to Bachelor thesis topics

» Jump to Master thesis topics

This list will be updated continuously!

Master thesis topics

Supervision Univ.-Prof. Han van der Aa, Ph.D.

Application instructions: When applying for projects with Prof. Van der Aa, please send an e-mail including a transcript of records and short CV in your e-mail. Beyond the listed topics, you are also welcome to propose your own topic or indicate general directions that interests you when applying. Note that, unless stated otherwise, all listed research directions can be scoped to be suitable for P1, P2, and master projects. Ideally you work on the same direction from P1, through P2, up to your master project.

P1-P2-Master topic

Complexity-based activity DFG filtering (P1-P2 only). Directly-follows graphs (DFGs) are commonly used to visualize how a process has been executed. While their notation is simple, DFGs themselves can be incredibly complex, yielding so-called spaghetti processes. To deal with this complexity, process mining tools provide filtering functionality, where users can apply filters to reduce the number of nodes or edges that are displayed. Currently, these filters are almost exclusively based on the frequency of nodes or edges. As a result, when applying a node-based filter, potentially many activities need to be omitted from consideration to make a DFG more understandable. This project sets out to develop and apply complexity-based filters for DFGs, which are filters that select activities to omit based on the impact that they have on the complexity of a DFG. The idea is to first exclude activities that lead to the most complexity (e.g., in terms of edges that they have), thus yielding the most gains for understandability.

Contact: Han van der Aa

Machine learning-based concept drift detection. Concept drift in process mining refers to a situation where a process undergoes changes over time, leading to a single event log containing data from multiple process versions. To avoid mixing these versions up during analysis, various techniques have been proposed to detect concept drifts. Since annotated training data was long missing for this task, the first ML-based technique for concept drift detection in process mining was only recently proposed. To deal with the complexity of encoding, this approach proposed turned to an indirect strategy to encode process changes in an image, after which a computer vision technique was trained on these images to detect drift. In this project, the aim is to consider and test other (more direct) encoding strategies and ML pipelines to apply to this task, striving to increase the accuracy of concept drift detection.

Literature:

Alexander Kraus, Han van der Aa. Looking for Change: A Computer Vision Approach for Concept Drift Detection in Process Mining. International Conference on Business Process Management (BPM 2024)

Contact: Han van der A

Detecting problematic cases for remaining time prediction (reserved): Remaining time prediction is a key task in the context of predictive process monitoring, striving to provide insights into the expected remaining during of ongoing cases. Although a broad range of techniques that tackle this task have been developed, considerable inaccuracies are still common. This project will investigate the hypothesis that these inaccuracies are primarily related to outliers and that a subset of these can be detected by looking for patterns that indicate "problematic cases" (e.g., cases that are highly complex or involve unresponsive customers). To this end, the project will start by gaining insights into mispredictions, before diving into the development of techniques that can help identify problematic cases or account for them during prediction.

Literature:

Many papers exist on remaining time prediction, but see e.g.: Keyvan Amiri Elyasi, Han van der Aa, Heiner Stuckenschmidt. PGTNet: A Process Graph Transformer Network for Remaining Time Prediction of Business Process Instances. International Conference on Advanced Information Systems Engineering (CAISE 2024)

Contact: Han van der Aa

Clustering directly-follows-graph collections (reserved): Directly-follows graphs (DFGs) are commonly used to visualize how a process has been executed. While their notation is simple, DFGs themselves can be incredibly complex, yielding so-called spaghetti processes.
This is because event logs often capture data on different versions or variants of a process, for instance, because the process is performed in different geographic locations, it handles different customer types, or it was subject to change throughout the timeline of the event log. Merging all versions into a single DFG obscures differences, giving an incomplete view. A known solution is splitting event logs by version and visualizing each separately, creating multiple DFGs. While this clarifies differences, it also burdens analysts, who must manage several DFGs instead of one to understand the overall process.
To make such exploration easier, this project will aim to test how clustering techniques can be used to identify groups of similar graphs, in order to allow users to quickly gain insights into a DFG collection. Particularly, the project will focus on the assessment and comparison of different clustering techniques, including once based on graph visualizations.

Contact: Han van der Aa

Privacy-aware conformance checking: Conformance checking in process mining aims to identify where the behavior of a process deviated from its expectations or requirements. However, to adhere to privacy regulations and ethical considerations, it is typically undesirable (or not allowed) to have conformance insights that can be traced back to individuals involved in the process. Rather, conformance checking should provide insights that can lead to improvements of the process overall.
This project aims to develop and evaluate tailored techniques for privacy-preserving conformance checking, striving to preserve as much information as possible about deviations in the process, while adhering to certain defined privacy guarantees.

Literature:

Conformance checking, see e.g.,: Wil van der Aalst. Process Mining: Data Science in Action (Chapter 8).)
Privacy-aware process mining, see e.g.: Stephan A. Fahrenkrog-Petersen, Han van der Aa, Matthias Weidlich. Optimal Event Log Sanitization for Privacy-Preserving Process Mining. Data & Knowledge Engineering, 145: 102175, 2023

Contact: Han van der Aa

Event log privacy auditing (Master thesis only): Event logs are often anonymized to ensure privacy. Most anonymization algorithms are hand-crafted by researchers, aiming to achieve formal privacy guarantees, such as differential privacy. A risk here is that the algorithm or its implementation could contain errors. Consequently, anonymized event logs might not have the targeted privacy guarantee, meaning that the data of individuals may not be as secure as expected (or promised). In other domains, Differential Privacy Auditing techniques have been designed that allows one to check if an algorithm actually fulfills the differential privacy guarantee.
The aim of this project is to adjust existing Differential Privacy Auditing techniques to the anonymization of event logs in order to evaluate anonymization techniques from the process mining domain.

Literature:

Tramer, F., Terzis, A., Steinke, T., Song, S., Jagielski, M., & Carlini, N. (2022). Debugging differential privacy: A case study for privacy auditing. arXiv preprint arXiv:2202.12219.
Stephan A. Fahrenkrog-Petersen, Han van der Aa, Matthias Weidlich. PRIPEL: PrivacyPreserving Event Log Publishing Including Contextual Information. International Conference on Business Process Management (BPM 2020)

Contact: Han van der Aa

Label generation for event abstraction: Event abstraction is a process mining task that aims to group together related events (or event classes). While such abstraction helps to reduce the complexity of event logs or models discovered from them, it is generally unclear how the nodes that represent such abstracted groups of events should be labeled. This is highly problematic, given that these node labels form the basis for what model readers can understand about a process.

Therefore, this project aims to investigate how to generate meaningful labels for abstracted events. This will involve conceptual considerations of what information an abstracted label should capture, using LLMs to generate labeling suggestions, and assessing the quality of generated labels with users.

Literature:

van Zelst, S. J., Mannhardt, F., de Leoni, M., & Koschmider, A. (2021). Event abstraction in process mining: literature review and taxonomy. Granular Computing, 6, 719-736.

Contact: Han van der Aa

Bachelor thesis topics

Supervision Univ.-Prof. Dr. Erich Schikuta

Analysing pass sequences in football

Goal: Evaluate patterns of attacking and defending in football using a graph representation of pass sequences. Based on tracking data (all positions of the players and the ball during a match) and event data (passes, shots, tackles, ...) from football matches, we can model passes and the defensive involvements within these passes. In this topic, we want to create a comprehensive action-by-action graph representation of attack sequences based on these models of passing and defending, blended with a description of player roles within a team. These sequences can reveal interesting patterns, for example dangerous attacks of a team may often start in a certain zone or involve certain defenders. By clustering and analyzing the pass sequences, we want to create new insights into the complex interactive behaviours that determine success in football and develop tools for coaches and analysts to improve the playing performance of their teams.

The necessary data for the project will be provided.

Requirements:

- Python

- Interest and basic knowledge of football (soccer) is a plus

Number of Students: 1-2

Resources:

Hassard, P. and Kerr, D. (2024). Predicting Football Match Outcomes Using Event Data and Machine Learning Algorithms. 35th Irish Signals and Systems Conference (ISSC), pp. 1-6, 10.1109/ISSC61953.2024.10603147.

Wäsche, H., Dickson, G., Woll, A., & Brandes, U. (2017). Social network analysis in sport research: an emerging paradigm. European Journal for Sport and Society, 14(2), 138–165. https://doi.org/10.1080/16138171.2017.1318198

Topic: Individual

It is also possible to submit individual topics. Doing so, please submit an abstract to your supervisor via email. If the topic fits the requirements for a bachelor thesis, the scope of the work is agreed with the supervisor at the beginning.

Supervisor:

Erich SCHIKUTA, erich.schikuta@univie.ac.at

Supervision and thesis in German or English

Supervision Marian Lux

Topic: AI 1 – Using a Large Language Model (LLM) to build a Knowledge Graph from scratch to improve Retrieval-Augmented Generation (RAG) for answering questions on custom data

id=NEW67977ce2bc178482495505

Goal:

Develop a knowledge graph from scratch by identifying named entities and relations between those entities. Store the knowledge graph and use it improve answers on questions by using a local LLM, e.g., Llama [1]/Mistral [4] and the RAG approach [2].

For questions and answers a simple chatbot should be implemented by using a UI framework (e.g., [3]) or an existing chat UI (e.g., [5])

The data for RAG and the knowledge graph is extracted from web URLs or documents which are from a particular domain. The user is open to define a domain via the UI (e.g., sports) and an existing knowledge graph can be loaded from file system/database or a new one is created based on indexed documents and manual interactions. Thus, the application state is stored.

The code is open-source and will be hosted public on GitHub.

How to build the knowledge graph requires first a literature research where different approaches should be evaluated. One, a mixture of multiple or even a new approach is created. For example, by incorporating LLMs.

Recommended requirements:

Implementation in Python

Access to GPU with at least 8GB Ram or free Google Colab account

Interest in building LLM AI solutions

Supervisor:

Marian LUX – marian.lux@univie.ac.at

Supervision and thesis in German or English

References:

[1] https://ai.meta.com/llama/

[2] Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., ... & Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33, 9459-9474.

[3] https://streamlit.io/

[4] https://mistral.ai/

[5] https://core.telegram.org/bots/api

Additional References:

Fatehkia, M., Lucas, J. K., & Chawla, S. (2024). T-RAG: lessons from the LLM trenches. arXiv preprint arXiv:2402.07483.

Topic: AI 2 – Improvement of a Multi-Agent approach which uses Large Language Models (LLMs) for responses by incorporating Retrieval-Augmented Generation (RAG) to consider own data in queries

id=4864

Goals:

Improve an agentic architecture where each agent has a particular role to improve answers on questions by using a local LLM, e.g., Llama [1]/Mistral [4] and the RAG approach [2].
Improve the existing pipeline and workflow of the chatbot [6]
For questions and answers a simple chatbot is already implemented by using the Telegram UI [5] where you will improve the user interface of the Telegram bot (buttons, structured output etc.)

Data for RAG with the agentic architecture (i.e., multiple ReAct[7] agents) is extracted from web URLs (defined during runtime) or documents (uploaded during runtime). The use case of the multi agent approach [3] is currently developed for a blogger:
- Research alternative state-of-the-art approaches and evaluate if the current implementation is already the best
- Make the code flexible that it works with different configs and questions on different use cases/domains, e.g., a lawyer.
Bugfixes

The code is hosted public on GitHub.

Recommended requirements:

Implementation in Python

Access to GPU with at least 8GB Ram or free Google Colab account

Interest in building LLM AI solutions

Supervisor:

Marian LUX - marian.lux@univie.ac.at

Supervision and thesis in German or English

References:

[1] https://ai.meta.com/llama/

[3] https://docs.crewai.com/introduction

[4] https://mistral.ai/

[5] https://core.telegram.org/bots/api

[6] https://github.com/annavalentinakatharina/RAG-MAS-Blog-Chatbot

[7] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., & Cao, Y. (2022). React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629.

Topic: AI 3 – Using a Multimodal Large Language Model (LLM) to improve Retrieval-Augmented Generation (RAG) by answering questions on custom data

id=4865

Goal:

Develop an agentic architecture where each agent has a particular role to improve answers on questions by using a local LLM/LMM, e.g., Llama [1]/Mistral [4]/ LLaVA[8-9] and the RAG approach [2].

For questions and answers a simple chatbot should be implemented by using a UI framework (e.g., [3]) or an existing chat UI (e.g., [5])

The data for RAG, including images like diagrams, is extracted from web URLs or documents which are predefined from a particular domain (open to choose, e.g., company knowledge system, AI Act[7]).

Fine-tuning is considered as a promising approach to help the LMM understanding images for a particular domain by using the VSC [6].

Automated tests with LLM inferencing evaluate the approach.

The code and whole documentation will be hosted public on GitHub as open source.

Recommended requirements:

Implementation in Python

Access to GPU with at least 8GB Ram or free Google Colab account

Interest in building LLM AI solutions

Supervisor:

Marian LUX - marian.lux@univie.ac.at

Supervision and thesis in German or English

References:

[1] https://ai.meta.com/llama/

[3] https://streamlit.io/

[4] https://mistral.ai/

[5] https://core.telegram.org/bots/api

[6] https://vsc.ac.at/access/

[7] https://digital-strategy.ec.europa.eu/en/policies/regulatory-framework-ai

[8] https://llava-vl.github.io/

[9] Liu, H., Li, C., Li, Y., & Lee, Y. J. (2024). Improved baselines with visual instruction tuning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 26296-26306).

Topic: AI 4 – Building an In Context Learning (ICL) framework for Large Language Models (LLMs) to improve Retrieval-Augmented Generation (RAG) and their accuracy of answers

id=4866

Goal:

Develop an In Context Learning (ICL) framework [3], [5-6] that works on state-of-the-art frameworks like LLamaIndex or LangChain. The framework contains workflows and pipelines for changing prompts based on the query or a particular tool to call (e.g. vector search, grading of answers etc.), to achieve high accuracy answers.

The approach will be evaluated on a local LLM, e.g., Llama [1]/Mistral [4] and the RAG approach [2] which is part of the work.

The code and whole documentation will be hosted public on GitHub as open source.

Recommended requirements:

Implementation in Python

Access to GPU with at least 8GB Ram or free Google Colab account

Strong interest in building LLM AI solutions and on dynamic prompt generation.

Supervisor:

Marian LUX - marian.lux@univie.ac.at

Supervision and thesis in German or English

References:

[1] https://ai.meta.com/llama/

[3] https://www.lakera.ai/blog/what-is-in-context-learning

[4] https://mistral.ai/

[5] https://arxiv.org/abs/2005.14165

[6] https://medium.com/generative-ai/how-a-small-language-model-can-achieve-100-accuracy-323a789ffa83

Topic: AI 5 – Evaluation Approaches for Retrieval Augmented Generation (RAG).

id=4867

Goal:

Develop an open-source framework which evaluates state-of-the-art (SOTA) RAG applications where users chat with their own data.

SOTA RAG [3] applications are available here [1] but also an own application could be developed with an open source LLM like [2] by using Ollama [4]

The framework utilizes at least 3 different evaluation approaches, including Ragas [5] to evaluate RAG applications. Define evaluation goals and metrics to use in the framework. The framework should be used like automatic tests with an easy-to-use API for various LLM Python applications.

The code in open-source and will be hosted public on GitHub [1].

The evaluation of RAG applications requires first a literature research where different approaches should be evaluated.

Recommended requirements:

Implementation in Python

Access to GPU with at least 8 GB Ram or free Google Colab account

Interest in building LLM AI solutions

Supervisor:

Marian LUX – marian.lux@univie.ac.at

Supervision and thesis in German or English

References:

[1] https://github.com/MLUX-University-of-Vienna?tab=repositories

[2] https://ai.meta.com/llama/

[3] Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., ... & Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33, 9459-9474.

[4] ollama.com

[5] https://docs.ragas.io/

Topic: AI 6 – Developing a ReAct inspired Agent from scratch for utilizing on Large Language Models (LLMs).

Goal:

Develop a ReAct (Reason + Act) inspired Agent [1],[2] from scratch by considering splitting a query into sub tasks and by providing a pool of tools to choose from to answer the query in a minimum set of iterations.

The agent works with local open source LLMs (max. 8b) where it is recommended to use the Ollama [3][6]. Furthermore, the implementation and is easy to extend and maintain.

For evaluation of the agent, provide at least the following tools

Vector search for documents of a particular topic
Web search for current data
Wikipedia Search for Encyclopedia
Calculator for mathematical expressions

on a simple chatbot implementation (where an existing one can be used as well).

The code is open-source and will be hosted public on GitHub. A PyPI repository is expected. Therefore the agent is used like SOTA ReAct agent frameworks, e.g., LlamaIndex[4] or LangChain [5].

Recommended requirements:

Implementation in Python

Interest in building LLM AI solutions

Supervisor:

Marian LUX – marian.lux@univie.ac.at

Supervision and thesis in German or English

References:

[1] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., & Cao, Y. (2023, January). React: Synergizing reasoning and acting in language models. In International Conference on Learning Representations (ICLR).

[2] ReAct Prompting: https://www.promptingguide.ai/techniques/react

[3] Ollama: https://ollama.com/

[4] LlamaIndex ReAct Agent: https://docs.llamaindex.ai/en/stable/examples/agent/react_agent/

[5] LangChain ReAct Agent: https://python.langchain.com/v0.1/docs/modules/agents/agent_types/react/

[6] Ollama Python Library: https://github.com/ollama/ollama-python

Topic: Process Mining – Process Discovery Visualization Tool in Python

id=4868

Goal:

Improving/maintaining and providing new features to the already existing process mining[6] tool from a previous bachelor theses [1],[5].

Different topics based on this process mining tool are possible. Some of them are listed below and the scope of the work is agreed with the supervisor at the beginning. Please write an email with your preferences to the supervisor:

Bug fixing and improvements of the existing solution [10]
Incorporating/merging code from other code branches which were developed from prior bachelor projects
Development of more inductive miner variants (including infrequent) [2]
Incorporating/merging node and edge filtering through selected metrics on the existing algorithms
Improving the algorithms with additional visualization methods (e.g., for Fuzzy Miner [3], [4])
Decision tool for which process mining algorithm fits best for a particular use case or data set [6],[7]
- Integration of a local open source LLM which helps with the decision.
- Questionnaire based for use case in combination with data analysis
Development of an Object Centric Process Mining integration [8], [9]
Improving the UI
XES importer and exporter tool [6]

The whole work is open source, also the contribution during the bachelor thesis. This means that the code must be easily maintainable and extendable. The written code should contain unit tests as well.

The code will be hosted public on GitHub.

Because of the different topics, multiple students may work simultaneously on the same code base. This implies that at the end of the thesis, all changes must be merged into the main project (GitHub).

Recommended requirements:

Implementation in Python

Only basic frameworks (numpy, scikit-learn etc.) can be used. For other more sophisticated frameworks, a permission from supervisor is mandatory.

Supervisor:

Marian LUX - marian.lux@univie.ac.at

Supervision and thesis in German or English

References:

[1] https://github.com/MLUX-University-of-Vienna/ProcessMiningVisualization_SS24_Frauenberger/tree/master

[2] van Detten, J. N., Schumacher, P., & Leemans, S. J. (2023, October). An approximate inductive miner. In 2023 5th International Conference on Process Mining (ICPM) (pp. 129-136). IEEE.

[3] Okoye, K., Naeem, U., & Islam, S. (2017). Semantic fuzzy mining: Enhancement of process models and event logs analysis from syntactic to conceptual level. International Journal of Hybrid Intelligent Systems, 14(1-2), 67-98.

[4] Günther, C. W., & Van Der Aalst, W. M. (2007, September). Fuzzy mining–adaptive process simplification based on multi-perspective metrics. In International conference on business process management (pp. 328-343). Berlin, Heidelberg: Springer Berlin Heidelberg.

[5] https://github.com/MLUX-University-of-Vienna?tab=repositories

[6] Van Der Aalst, W., & van der Aalst, W. (2016). Data science in action (pp. 3-23). Springer Berlin Heidelberg.

[7] https://research.aimultiple.com/process-mining-algorithms/

[8] https://www.ocpm.info/ocel_demo.html

[9] https://encyclopedia.pub/video/video_detail/785

[10] https://github.com/fabianf00/ProcessMiningVisualization_WS23/issues/41

Topic: Individual

Supervisor:

Marian LUX - marian.lux@univie.ac.at

Supervision and thesis in German or English

Open Topics

Master thesis topics

Supervision Univ.-Prof. Han van der Aa, Ph.D.

P1-P2-Master topic

Bachelor thesis topics

Supervision Univ.-Prof. Dr. Erich Schikuta

Analysing pass sequences in football

Topic: Individual

­Supervision Marian Lux

Topic: AI 1 – Using a Large Language Model (LLM) to build a Knowledge Graph from scratch to improve Retrieval-Augmented Generation (RAG) for answering questions on custom data

Topic: AI 2 – Improvement of a Multi-Agent approach which uses Large Language Models (LLMs) for responses by incorporating Retrieval-Augmented Generation (RAG) to consider own data in queries

Topic: AI 3 – Using a Multimodal Large Language Model (LLM) to improve Retrieval-Augmented Generation (RAG) by answering questions on custom data

Topic: AI 4 – Building an In Context Learning (ICL) framework for Large Language Models (LLMs) to improve Retrieval-Augmented Generation (RAG) and their accuracy of answers

Topic: AI 5 – Evaluation Approaches for Retrieval Augmented Generation (RAG).

Topic: AI 6 – Developing a ReAct inspired Agent from scratch for utilizing on Large Language Models (LLMs).

Topic: Process Mining – Process Discovery Visualization Tool in Python

Topic: Individual

Supervision Marian Lux