Open Topics


The Workflow Systems and Technologies Group offers topics for bachelor and master theses as well as Master Praktikum. The following list contains a number of current suggestions for topics.

To discuss one of those or any other topic in this research field, please contact the particulary named supervisor.

» Jump to Bachelor thesis topics

» Jump to Master thesis topics

This list will be updated continuously!

Master thesis topics

Supervision Univ.-Prof. Han van der Aa, Ph.D.

P1-P2-Master topics

  • Privacy-aware conformance checking. Conformance checking in process mining aims to identify where the behavior of a process deviated from its expectations or requirements. However, to adhere to privacy regulations and ethical considerations, it is typically undesirable (or not allowed) to have conformance insights that can be traced back to individuals involved in the process. Rather, conformance checking should provide insights that can lead to improvements of the process overall.
    This project aims to develop and evaluate tailored techniques for privacy-preserving conformance checking, striving to preserve as much information as possible about deviations in the process, while adhering to certain defined privacy guarantees.

    Literature:

    • Conformance checking, see e.g.,: Wil van der Aalst. Process Mining: Data Science in Action (Chapter 8).
    • Privacy-aware process mining, see e.g.: Stephan A. Fahrenkrog-Petersen, Han van der Aa, Matthias Weidlich. Optimal Event Log Sanitization for Privacy-Preserving Process Mining. Data & Knowledge Engineering, 145: 102175, 2023

     
    Contact: Han van der Aa

  • Semantics-aware process discovery. Process discovery traditionally aims to derive a process model from available data, capturing how a process was historically executed. However, due to the existence of large repositories of process models and the potential of Large Language Models (LLMs) it is also possible to train models that learn how a process should be executed based on the semantics of the involved activities.
    This project aims to develop and evaluate techniques for semantics-aware process discovery, which exploit semantic information to derive desired execution orders of a process, given a set of activities. Such semantic process models can be used, for instance, as a basis for conformance checking, allowing for the comparison of actual to desired behavior.

    Literature:

    • Adrian Rebmann, Fabian David Schmidt, Goran Glavaš, Han van der Aa. Evaluating the Ability of LLMs to Solve Semantics-Aware Process Mining Tasks. International Conference on Process Mining (ICPM 2024)
    • Julian Caspary, Adrian Rebmann, Han van der Aa. Does This Make Sense? Machine Learning-based Detection of Semantic Anomalies in Business Processes. International Conference on Business Process Management (BPM 2023)


    Contact: Han van der Aa

  • Product Data Model derivation from event logs. Process mining uses data that records sequences of activities that were performed to execute process instances. Underlying these executions, there is typically an informational or physical product (e.g., a bank loan or a bicycle) that represents the outcome for that case. Stemming from the area of process redesign, a Product Data Model (PDM) is a means to capture the informational elements in achieving this outcome of process and their inter-relations. Yet, no work has strived so far to derive such PDMs from event data, despite the opportunities that this provides for recognizing how processes achieve their goals.
    This project aims to develop a technique to derive a PDM from event data and use it as a basis for downstream process mining tasks, such as process discovery or abstraction.

    Literature:

    • Example paper that covers the PDM: Han van der Aa, Hajo A. Reijers, Irene Vanderfeesten. Designing Like a Pro: The Automated Composition of Workflow Activities. Computers in Industry, 75(1): 162-177, 2016.
      Contact: Han van der Aa

  • Detecting problematic cases for remaining time prediction. Remaining time prediction is a key task in the context of predictive process monitoring, striving to provide insights into the expected remaining during of ongoing cases. Although a broad range of techniques that tackle this task have been developed, considerable inaccuracies are still common. This project will investigate the hypothesis that these inaccuracies are primarily related to outliers and that a subset of these can be detected by looking for patterns that indicate "problematic cases" (e.g., cases that are highly complex or involve unresponsive customers). To this end, the project will start by gaining insights into mispredictions, before diving into the development of techniques that can help identify problematic cases or account for them during prediction.

    Literature:

    • Many papers exist on remaining time prediction, but see e.g.: Keyvan Amiri Elyasi, Han van der Aa, Heiner Stuckenschmidt. PGTNet: A Process Graph Transformer Network for Remaining Time Prediction of Business Process Instances. International Conference on Advanced Information Systems Engineering (CAISE 2024)
      Contact: Han van der Aa

Bachelor theses topics


­­Supervision Marian Lux

Topic: AI 1 – Using a Large Language Model (LLM) to build a Knowledge Graph from scratch to improve Retrieval-Augmented Generation (RAG) for answering questions on custom data

A network of colorful circles and lines Description automatically generated

Goal:

Develop a knowledge graph from scratch by identifying named entities and relations between those entities. Store the knowledge graph and use it improve answers on questions by using a local LLM, e.g., Llama [1]/Mistral [4] and the RAG approach [2].

For questions and answers a simple chatbot should be implemented by using a UI framework (e.g., [3]) or an existing chat UI (e.g., [5])

The data for RAG and the knowledge graph is extracted from web URLs or documents which are predefined from a particular domain (open to choose, e.g., sports).

The code will be hosted public on GitHub.

How to build the knowledge graph requires first a literature research where different approaches should be evaluated, and one should be chosen or even a new one created (resp. adapted from an existing approach).

 

Recommended requirements:

Implementation in Python

Access to GPU with at least 8GB Ram or free Google Colab account

Interest in building LLM AI solutions

 

Supervisor:

Marian LUX – marian.lux@univie.ac.at

Supervision and thesis in German or English

 

References:

[1] https://ai.meta.com/llama/

[2] Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., ... & Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33, 9459-9474.

[3] https://streamlit.io/

[4] https://mistral.ai/

[5] https://core.telegram.org/bots/api

 

Additional References:

Fatehkia, M., Lucas, J. K., & Chawla, S. (2024). T-RAG: lessons from the LLM trenches. arXiv preprint arXiv:2402.07483.

 

Topic: AI 2 – Build a Multi-Agent approach for a Large Language Model (LLM) to improve responses and by using Retrieval-Augmented Generation (RAG) to consider own data in queries

A diagram of a diagram of a group of people Description automatically generated

Goal:

Develop an agentic architecture where each agent has a particular role to improve answers on questions by using a local LLM, e.g., Llama [1]/Mistral [4] and the RAG approach [2].

For questions and answers a simple chatbot will be implemented by using a UI framework (e.g., [3]) or an existing chat UI (e.g., [5])

The data for RAG with the agentic architecture (i.e., multiple ReAct[7] agents) is extracted from web URLs (defined during runtime) or documents (uploaded during runtime). The use case is open to choose (e.g., blogger, lawyer).

The code will be hosted public on GitHub.

How to build the agentic architecture requires first a literature research where different approaches should be evaluated, and one should be chosen or even a new one created (or adapted from an existing approach [e.g., [6]]).

 

Recommended requirements:

Implementation in Python

Access to GPU with at least 8GB Ram or free Google Colab account

Interest in building LLM AI solutions

 

Supervisor:

Marian LUX - marian.lux@univie.ac.at

Supervision and thesis in German or English

 

References:

[1] https://ai.meta.com/llama/

[2] Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., ... & Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33, 9459-9474.

[3] https://streamlit.io/

[4] https://mistral.ai/

[5] https://core.telegram.org/bots/api

[6] https://abvijaykumar.medium.com/multi-agent-architectures-e09c53c7fe0d

[7] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., & Cao, Y. (2022). React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629.

 

Topic: AI 3 – Using a Multimodal Large Language Model (LLM) to improve Retrieval-Augmented Generation (RAG) by answering questions on custom data

 

Goal:

Develop an agentic architecture where each agent has a particular role to improve answers on questions by using a local LLM, e.g., Llama [1]/Mistral [4]/ LLaVA[8-9] and the RAG approach [2].

For questions and answers a simple chatbot should be implemented by using a UI framework (e.g., [3]) or an existing chat UI (e.g., [5])

The data for RAG, including images like diagrams, is extracted from web URLs or documents which are predefined from a particular domain (open to choose, e.g., company knowledge system, AI Act[7]).

 

The code will be hosted public on GitHub.

How to build the agentic architecture requires first a literature research where different approaches should be listed, and one should be chosen or even a new one created (or adapted from an existing approach [e.g., [6]]).

 

Recommended requirements:

Implementation in Python

Access to GPU with at least 8GB Ram or free Google Colab account

Interest in building LLM AI solutions

 

Supervisor:

Marian LUX - marian.lux@univie.ac.at

Supervision and thesis in German or English

 

References:

[1] https://ai.meta.com/llama/

[2] Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., ... & Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33, 9459-9474.

[3] https://streamlit.io/

[4] https://mistral.ai/

[5] https://core.telegram.org/bots/api

[6] https://abvijaykumar.medium.com/multi-agent-architectures-e09c53c7fe0d

[7] https://digital-strategy.ec.europa.eu/en/policies/regulatory-framework-ai

[8] https://llava-vl.github.io/

[9] Liu, H., Li, C., Li, Y., & Lee, Y. J. (2024). Improved baselines with visual instruction tuning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 26296-26306).

 

Topic: AI 4 – Evaluation Approaches for Retrieval Augmented Generation (RAG).

 

Goal:

Develop an open source State-of-the-Art RAG [2] application where users can chat with their own data by using elastic search with [8] and local open source LLMs like [1] or [4] by using Ollama [6]. The developed chatbot should be restricted to give only answers on uploaded data (files, web URLs) for a custom predefined domain. Otherwise the answer should return “out of scope”.

Develop at least 3 different evaluation approaches, including Ragas [7] which evaluate the RAG application.

The code will be hosted public on GitHub.

The evaluation of RAG applications requires first a literature research where different approaches should be evaluated. Possible UI techniques for the chatbot are [3] or [5].

 

Recommended requirements:

Implementation in Python

Access to GPU with at least 8 GB Ram or free Google Colab account

Interest in building LLM AI solutions

 

Supervisor:

Marian LUX – marian.lux@univie.ac.at

Supervision and thesis in German or English

 

References:

[1] https://ai.meta.com/llama/

[2] Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., ... & Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33, 9459-9474.

[3] https://streamlit.io/

[4] https://mistral.ai/

[5] https://core.telegram.org/bots/api

[6] ollama.com

[7] https://docs.ragas.io/

[8] https://r2r-docs.sciphi.ai/introduction

 

Topic: Process Mining – Process Discovery Visualization Tool in Python

Several screens of a computer Description automatically generated

Goal:

Improving/maintaining and providing new features to the already existing process mining tool from a previous bachelor thesis [1]

Different topics based on this process mining tool are possible. Some of them are listed below and the scope of the work is agreed with the supervisor at the beginning. Please write an email with your preferences to the supervisor:

  • Bug fixing and improvements of existing solution [1]
  • Incorporating/merging node filtering (by using the spm metric [5]) and the Alpha Miner [5] from the existing solution [2]
  • Implementing the spm metric [5] in other existing algorithms for node filtering
  • Enhancing the Fuzzy Miner with additional filtering and visualization methods [3], [4]
  • Decision tool for which process mining algorithm fits best for a particular use case or data set [6-7]

    • Integration of a local open source LLM which helps with the decision.
    • Questionnaire based for use case in combination with data analysis

  • Development of an Object Centric Process Mining integration [9], [10]
  • Improving the UI, e.g., by implementing the UI or custom elements with Google’s Mesop Framework [8]

The whole work is open source, also the contribution during the bachelor thesis. This means that the code must be easily maintainable and extendable. The written code should contain unit tests as well.

The code will be hosted public on GitHub.

Because of the different topics, multiple students may work simultaneously on the same code base. This implies that at the end of the thesis, all changes must be merged into the main project (GitHub).

 

Recommended requirements:

Implementation in Python

Only basic frameworks (numpy, scikit-learn etc.) can be used. For other more sophisticated frameworks, a permission from supervisor is mandatory.

 

Supervisor:

Marian LUX - marian.lux@univie.ac.at

Supervision and thesis in German or English

 

References:

[1] https://github.com/fabianf00/ProcessMiningVisualization_WS23

[2] https://github.com/rustemia98/ProcessMiningVisualization_WS23/

[3] Okoye, K., Naeem, U., & Islam, S. (2017). Semantic fuzzy mining: Enhancement of process models and event logs analysis from syntactic to conceptual level. International Journal of Hybrid Intelligent Systems, 14(1-2), 67-98.

[4] Günther, C. W., & Van Der Aalst, W. M. (2007, September). Fuzzy mining–adaptive process simplification based on multi-perspective metrics. In International conference on business process management (pp. 328-343). Berlin, Heidelberg: Springer Berlin Heidelberg.

[5] Lux, M., Rinderle-Ma, S., & Preda, A. (2018). Assessing the quality of search process models. In Business Process Management: 16th International Conference, BPM 2018, Sydney, NSW, Australia, September 9–14, 2018, Proceedings 16 (pp. 445-461). Springer International Publishing.

[6] Van Der Aalst, W., & van der Aalst, W. (2016). Data science in action (pp. 3-23). Springer Berlin Heidelberg.

[7] https://research.aimultiple.com/process-mining-algorithms/

[8] https://google.github.io/mesop/

[9] https://www.ocpm.info/ocel_demo.html

[10] https://encyclopedia.pub/video/video_detail/785

 


Topic: Individual

 

It is also possible to submit individual topics. Doing so, please submit an abstract to your supervisor via email. If the topic fits the requirements for a bachelor thesis, the scope of the work is agreed with the supervisor at the beginning.

Supervisor:

Marian LUX - marian.lux@univie.ac.at

Supervision and thesis in German or English

 

-->