Open Topics


The Workflow Systems and Technologies Group offers topics for bachelor and master theses as well as Master Praktikum. The following list contains a number of current suggestions for topics.

To discuss one of those or any other topic in this research field, please contact the particulary named supervisor.

» Jump to Bachelor thesis topics

» Jump to Master thesis topics

This list will be updated continuously!

Master thesis topics

Supervision Univ.-Prof. Han van der Aa, Ph.D.

Application instructions: When applying for projects with Prof. Van der Aa, please send an e-mail including a transcript of records and short CV in your e-mail. Beyond the listed topics, you are also welcome to propose your own topic or indicate general directions that interests you when applying. Note that, unless stated otherwise, all listed research directions can be scoped to be suitable for P1, P2, and master projects. Ideally you work on the same direction from P1, through P2, up to your master project.

P1-P2-Master topics

  • Detecting problematic cases for remaining time prediction: Remaining time prediction is a key task in the context of predictive process monitoring, striving to provide insights into the expected remaining during of ongoing cases. Although a broad range of techniques that tackle this task have been developed, considerable inaccuracies are still common. This project will investigate the hypothesis that these inaccuracies are primarily related to outliers and that a subset of these can be detected by looking for patterns that indicate "problematic cases" (e.g., cases that are highly complex or involve unresponsive customers). To this end, the project will start by gaining insights into mispredictions, before diving into the development of techniques that can help identify problematic cases or account for them during prediction.

    Literature:

    • Many papers exist on remaining time prediction, but see e.g.: Keyvan Amiri Elyasi, Han van der Aa, Heiner Stuckenschmidt. PGTNet: A Process Graph Transformer Network for Remaining Time Prediction of Business Process Instances. International Conference on Advanced Information Systems Engineering (CAISE 2024)
      Contact: Han van der Aa
       

  • Clustering directly-follows-graph collections (reserved): Directly-follows graphs (DFGs) are commonly used to visualize how a process has been executed. While their notation is simple, DFGs themselves can be incredibly complex, yielding so-called spaghetti processes.
    This is because event logs often capture data on different versions or variants of a process, for instance, because the process is performed in different geographic locations, it handles different customer types, or it was subject to change throughout the timeline of the event log. Merging all versions into a single DFG obscures differences, giving an incomplete view. A known solution is splitting event logs by version and visualizing each separately, creating multiple DFGs. While this clarifies differences, it also burdens analysts, who must manage several DFGs instead of one to understand the overall process.
    To make such exploration easier, this project will aim to test how clustering techniques can be used to identify groups of similar graphs, in order to allow users to quickly gain insights into a DFG collection. Particularly, the project will focus on the assessment and comparison of different clustering techniques, including once based on graph visualizations.

    Contact: Han van der Aa
  • Privacy-aware conformance checking: Conformance checking in process mining aims to identify where the behavior of a process deviated from its expectations or requirements. However, to adhere to privacy regulations and ethical considerations, it is typically undesirable (or not allowed) to have conformance insights that can be traced back to individuals involved in the process. Rather, conformance checking should provide insights that can lead to improvements of the process overall.
    This project aims to develop and evaluate tailored techniques for privacy-preserving conformance checking, striving to preserve as much information as possible about deviations in the process, while adhering to certain defined privacy guarantees.

    Literature:

    • Conformance checking, see e.g.,: Wil van der Aalst. Process Mining: Data Science in Action (Chapter 8).)
    • Privacy-aware process mining, see e.g.: Stephan A. Fahrenkrog-Petersen, Han van der Aa, Matthias Weidlich. Optimal Event Log Sanitization for Privacy-Preserving Process Mining. Data & Knowledge Engineering, 145: 102175, 2023
      Contact: Han van der Aa
       

  • Event log privacy auditing (Master thesis only): Event logs are often anonymized to ensure privacy. Most anonymization algorithms are hand-crafted by researchers, aiming to achieve formal privacy guarantees, such as differential privacy. A risk here is that the algorithm or its implementation could contain errors. Consequently, anonymized event logs might not have the targeted privacy guarantee, meaning that the data of individuals may not be as secure as expected (or promised). In other domains, Differential Privacy Auditing techniques have been designed that allows one to check if an algorithm actually fulfills the differential privacy guarantee.
    The aim of this project is to adjust existing Differential Privacy Auditing techniques to the anonymization of event logs in order to evaluate anonymization techniques from the process mining domain.

    Literature:

    • Tramer, F., Terzis, A., Steinke, T., Song, S., Jagielski, M., & Carlini, N. (2022). Debugging differential privacy: A case study for privacy auditing. arXiv preprint arXiv:2202.12219.
    • Stephan A. Fahrenkrog-Petersen, Han van der Aa, Matthias Weidlich. PRIPEL: PrivacyPreserving Event Log Publishing Including Contextual Information. International Conference on Business Process Management (BPM 2020)
      Contact: Han van der Aa
       

  • Label generation for event abstraction: Event abstraction is a process mining task that aims to group together related events (or event classes). While such abstraction helps to reduce the complexity of event logs or models discovered from them, it is generally unclear how the nodes that represent such abstracted groups of events should be labeled. This is highly problematic, given that these node labels form the basis for what model readers can understand about a process.

    Therefore, this project aims to investigate how to generate meaningful labels for abstracted events. This will involve conceptual considerations of what information an abstracted label should capture, using LLMs to generate labeling suggestions, and assessing the quality of generated labels with users.

    Literature:

    • van Zelst, S. J., Mannhardt, F., de Leoni, M., & Koschmider, A. (2021). Event abstraction in process mining: literature review and taxonomy. Granular Computing, 6, 719-736.
      Contact: Han van der Aa

Bachelor thesis topics


­­Supervision Marian Lux

Topic: AI 1 – Using a Large Language Model (LLM) to build a Knowledge Graph from scratch to improve Retrieval-Augmented Generation (RAG) for answering questions on custom data

id=NEW67977ce2bc178482495505

Goal:

Develop a knowledge graph from scratch by identifying named entities and relations between those entities. Store the knowledge graph and use it improve answers on questions by using a local LLM, e.g., Llama [1]/Mistral [4] and the RAG approach [2].

For questions and answers a simple chatbot should be implemented by using a UI framework (e.g., [3]) or an existing chat UI (e.g., [5])

The data for RAG and the knowledge graph is extracted from web URLs or documents which are from a particular domain. The user is open to define a domain via the UI (e.g., sports) and an existing knowledge graph can be loaded from file system/database or a new one is created based on indexed documents and manual interactions. Thus, the application state is stored.

The code is open-source and will be hosted public on GitHub.

How to build the knowledge graph requires first a literature research where different approaches should be evaluated. One, a mixture of multiple or even a new approach is created. For example, by incorporating LLMs.

 

Recommended requirements:

Implementation in Python

Access to GPU with at least 8GB Ram or free Google Colab account

Interest in building LLM AI solutions

 

Supervisor:

Marian LUX – marian.lux@univie.ac.at

Supervision and thesis in German or English

 

References:

[1] https://ai.meta.com/llama/

[2] Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., ... & Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33, 9459-9474.

[3] https://streamlit.io/

[4] https://mistral.ai/

[5] https://core.telegram.org/bots/api

 

Additional References:

Fatehkia, M., Lucas, J. K., & Chawla, S. (2024). T-RAG: lessons from the LLM trenches. arXiv preprint arXiv:2402.07483.

 

Topic: AI 2 – Improvement of a Multi-Agent approach which uses Large Language Models (LLMs) for responses by incorporating Retrieval-Augmented Generation (RAG) to consider own data in queries

id=4864

Goals:

  • Improve an agentic architecture where each agent has a particular role to improve answers on questions by using a local LLM, e.g., Llama [1]/Mistral [4] and the RAG approach [2].
  • Improve the existing pipeline and workflow of the chatbot [6]
  • For questions and answers a simple chatbot is already implemented by using the Telegram UI [5] where you will improve the user interface of the Telegram bot (buttons, structured output etc.)

  • Data for RAG with the agentic architecture (i.e., multiple ReAct[7] agents) is extracted from web URLs (defined during runtime) or documents (uploaded during runtime). The use case of the multi agent approach [3] is currently developed for a blogger:

    • Research alternative state-of-the-art approaches and evaluate if the current implementation is already the best
    • Make the code flexible that it works with different configs and questions on different use cases/domains, e.g., a lawyer.

  • Bugfixes

The code is hosted public on GitHub.

 

Recommended requirements:

Implementation in Python

Access to GPU with at least 8GB Ram or free Google Colab account

Interest in building LLM AI solutions

 

Supervisor:

Marian LUX - marian.lux@univie.ac.at

Supervision and thesis in German or English

 

References:

[1] https://ai.meta.com/llama/

[2] Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., ... & Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33, 9459-9474.

[3] https://docs.crewai.com/introduction

[4] https://mistral.ai/

[5] https://core.telegram.org/bots/api

[6] https://github.com/annavalentinakatharina/RAG-MAS-Blog-Chatbot

[7] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., & Cao, Y. (2022). React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629.

 

Topic: AI 3 – Using a Multimodal Large Language Model (LLM) to improve Retrieval-Augmented Generation (RAG) by answering questions on custom data

id=4865

Goal:

Develop an agentic architecture where each agent has a particular role to improve answers on questions by using a local LLM/LMM, e.g., Llama [1]/Mistral [4]/ LLaVA[8-9] and the RAG approach [2].

For questions and answers a simple chatbot should be implemented by using a UI framework (e.g., [3]) or an existing chat UI (e.g., [5])

The data for RAG, including images like diagrams, is extracted from web URLs or documents which are predefined from a particular domain (open to choose, e.g., company knowledge system, AI Act[7]).

Fine-tuning is considered as a promising approach to help the LMM understanding images for a particular domain by using the VSC [6].

Automated tests with LLM inferencing evaluate the approach.

 

The code and whole documentation will be hosted public on GitHub as open source.

 

Recommended requirements:

Implementation in Python

Access to GPU with at least 8GB Ram or free Google Colab account

Interest in building LLM AI solutions

 

Supervisor:

Marian LUX - marian.lux@univie.ac.at

Supervision and thesis in German or English

 

References:

[1] https://ai.meta.com/llama/

[2] Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., ... & Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33, 9459-9474.

[3] https://streamlit.io/

[4] https://mistral.ai/

[5] https://core.telegram.org/bots/api

[6] https://vsc.ac.at/access/

[7] https://digital-strategy.ec.europa.eu/en/policies/regulatory-framework-ai

[8] https://llava-vl.github.io/

[9] Liu, H., Li, C., Li, Y., & Lee, Y. J. (2024). Improved baselines with visual instruction tuning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 26296-26306).

 

Topic: AI 4 – Building an In Context Learning (ICL) framework for Large Language Models (LLMs) to improve Retrieval-Augmented Generation (RAG) and their accuracy of answers

id=4866

Goal:

Develop an In Context Learning (ICL) framework [3], [5-6] that works on state-of-the-art frameworks like LLamaIndex or LangChain. The framework contains workflows and pipelines for changing prompts based on the query or a particular tool to call (e.g. vector search, grading of answers etc.), to achieve high accuracy answers.

The approach will be evaluated on a local LLM, e.g., Llama [1]/Mistral [4] and the RAG approach [2] which is part of the work.

 

The code and whole documentation will be hosted public on GitHub as open source.

 

Recommended requirements:

Implementation in Python

Access to GPU with at least 8GB Ram or free Google Colab account

Strong interest in building LLM AI solutions and on dynamic prompt generation.

 

Supervisor:

Marian LUX - marian.lux@univie.ac.at

Supervision and thesis in German or English

 

References:

[1] https://ai.meta.com/llama/

[2] Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., ... & Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33, 9459-9474.

[3] https://www.lakera.ai/blog/what-is-in-context-learning

[4] https://mistral.ai/

[5] https://arxiv.org/abs/2005.14165

[6] https://medium.com/generative-ai/how-a-small-language-model-can-achieve-100-accuracy-323a789ffa83

 

Topic: AI 5 – Evaluation Approaches for Retrieval Augmented Generation (RAG).

id=4867

Goal:

Develop an open-source framework which evaluates state-of-the-art (SOTA) RAG applications where users chat with their own data.

SOTA RAG [3] applications are available here [1] but also an own application could be developed with an open source LLM like [2] by using Ollama [4]

The framework utilizes at least 3 different evaluation approaches, including Ragas [5] to evaluate RAG applications. Define evaluation goals and metrics to use in the framework. The framework should be used like automatic tests with an easy-to-use API for various LLM Python applications.

The code in open-source and will be hosted public on GitHub [1].

The evaluation of RAG applications requires first a literature research where different approaches should be evaluated.

 

Recommended requirements:

Implementation in Python

Access to GPU with at least 8 GB Ram or free Google Colab account

Interest in building LLM AI solutions

 

Supervisor:

Marian LUX – marian.lux@univie.ac.at

Supervision and thesis in German or English

 

References:

[1] https://github.com/MLUX-University-of-Vienna?tab=repositories

[2] https://ai.meta.com/llama/

[3] Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., ... & Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33, 9459-9474.

[4] ollama.com

[5] https://docs.ragas.io/
 

Topic: Process Mining – Process Discovery Visualization Tool in Python

id=4868

Goal:

Improving/maintaining and providing new features to the already existing process mining[6] tool from a previous bachelor theses [1],[5].

Different topics based on this process mining tool are possible. Some of them are listed below and the scope of the work is agreed with the supervisor at the beginning. Please write an email with your preferences to the supervisor:

  • Bug fixing and improvements of the existing solution [10]
  • Incorporating/merging code from other code branches which were developed from prior bachelor projects
  • Development of more inductive miner variants (including infrequent) [2]
  • Incorporating/merging node and edge filtering through selected metrics on the existing algorithms
  • Improving the algorithms with additional visualization methods (e.g., for Fuzzy Miner [3], [4])
  • Decision tool for which process mining algorithm fits best for a particular use case or data set [6],[7]

    • Integration of a local open source LLM which helps with the decision.
    • Questionnaire based for use case in combination with data analysis

  • Development of an Object Centric Process Mining integration [8], [9]
  • Improving the UI
  • XES importer and exporter tool [6]

The whole work is open source, also the contribution during the bachelor thesis. This means that the code must be easily maintainable and extendable. The written code should contain unit tests as well.

The code will be hosted public on GitHub.

Because of the different topics, multiple students may work simultaneously on the same code base. This implies that at the end of the thesis, all changes must be merged into the main project (GitHub).

 

Recommended requirements:

Implementation in Python

Only basic frameworks (numpy, scikit-learn etc.) can be used. For other more sophisticated frameworks, a permission from supervisor is mandatory.

 

Supervisor:

Marian LUX - marian.lux@univie.ac.at

Supervision and thesis in German or English

 

References:

[1] https://github.com/MLUX-University-of-Vienna/ProcessMiningVisualization_SS24_Frauenberger/tree/master

[2] van Detten, J. N., Schumacher, P., & Leemans, S. J. (2023, October). An approximate inductive miner. In 2023 5th International Conference on Process Mining (ICPM) (pp. 129-136). IEEE.

[3] Okoye, K., Naeem, U., & Islam, S. (2017). Semantic fuzzy mining: Enhancement of process models and event logs analysis from syntactic to conceptual level. International Journal of Hybrid Intelligent Systems, 14(1-2), 67-98.

[4] Günther, C. W., & Van Der Aalst, W. M. (2007, September). Fuzzy mining–adaptive process simplification based on multi-perspective metrics. In International conference on business process management (pp. 328-343). Berlin, Heidelberg: Springer Berlin Heidelberg.

[5] https://github.com/MLUX-University-of-Vienna?tab=repositories

[6] Van Der Aalst, W., & van der Aalst, W. (2016). Data science in action (pp. 3-23). Springer Berlin Heidelberg.

[7] https://research.aimultiple.com/process-mining-algorithms/

[8] https://www.ocpm.info/ocel_demo.html

[9] https://encyclopedia.pub/video/video_detail/785

[10] https://github.com/fabianf00/ProcessMiningVisualization_WS23/issues/41

 


Topic: Individual

 

It is also possible to submit individual topics. Doing so, please submit an abstract to your supervisor via email. If the topic fits the requirements for a bachelor thesis, the scope of the work is agreed with the supervisor at the beginning.

Supervisor:

Marian LUX - marian.lux@univie.ac.at

Supervision and thesis in German or English