Publikation
Generative AI and Conceptual Modeling
Hans-Georg Fill; Jennifer Horkoff; Peter Fettke; Julius Köppke
In: Oliver Hinz (Hrsg.). Business & Information Systems Engineering (BISE), Vol. 68, Pages 1-5, Springer Nature LINK, Wiesbaden, 2/2026.
Zusammenfassung
The release of ChatGPT in November 2022 has caused generative artificial intelligence to become a global phenomenon that is being explored in many areas of science and practice including conceptual modeling. Once known mainly to specialists in deep learning and text generation, machine learning models that are trained on vast amounts of data and usable across many different downstream tasks have now become commodity tools. Today, a large number of high-quality, pre-trained large language models (LLMs) are available. These models allow users to issue natural language prompts in order to create and analyze text, conceptual models, code, images, videos, and audio data.
LLMs can be categorized based on how readily available information is about their training data and the weights that determine a model’s capabilities. Proprietary models such as OpenAI’s advanced GPT models,Footnote1 Google’s Gemini,Footnote2 or Anthropic’s ClaudeFootnote3 models are typically only accessible through some service endpoint and neither disclose their training data nor their weights. Access to them requires the provision of paid or unpaid access through a provider. In contrast, the weights of open-weight models such as OpenAI’s GPT-OSS models,Footnote4 Meta’s Llama,Footnote5 Google’s Gemma,Footnote6 or Apple’s FerretFootnote7 can be downloaded and run on one’s own machine. Although the training data is not available in this case, this allows for independent experimentation. Most recently, open models such as the Swiss Apertus model (Hernández-Cano et al. 2025) disclose their training data and procedure as well their weights and are thus fully transparent. From the viewpoint of open science, open models are preferable in order to inspect and validate all components, ensure the reproducibility of research, and contribute to the quality of scientific work. However, as of today, the proprietary models are still far ahead in terms of generation quality, which makes it necessary for researchers to work with these models when conducting experiments. This applies also to the field of conceptual modeling and generative AI.
Conceptual modeling is a pivotal academic field not only in business and information systems engineering, but also in (management) information systems, (business) informatics, software engineering, process science and other disciplines (Frank et al. 2014; Michael et al. 2024; Mayr and Thalheim 2021). It supports human understanding and communication in general (Mylopoulos 1992), aligns business and IT aspects (Sandkuhl et al. 2018; Fill 2020), formalizes requirements for software-based systems (Horkoff & Yu 2016), and analyzes domain concepts and terminologies (Van Gils et al. 2022). Conceptual models are based on schemata in the form of modeling languages including syntax, semantics, and a visual or textual notation (Harel and Rumpe 2004). In addition, they are also used in model-driven engineering (Brambilla et al. 2017), for code generation (Sebastián et al. 2020) and simulation (Rosenthal et al. 2021). Further, conceptual models may be processed semantically, thereby acting as an interface to knowledge graphs and reasoning (Smajevic and Bork 2021; Fill 2017). Conceptual models are the result of higher-order cognitive processes such as abstraction, relational reasoning based on the integration, and maintenance of information. They are thus subject to individual differences between modelers and their interpretations (Wilmont et al. 2013).
Soon after the introduction of ChatGPT, it became apparent that large language models are highly capable of creating and analyzing conceptual models in a variety of modeling languages (Fill et al. 2024). Based on some textual input, LLMs can create conceptual models by using existing syntax formats such as PlantUML or BPMN-XML as well as newly specified ones. At first this was surprising since the models were not specifically trained for such tasks. However, among the large amounts of training data, there was obviously sufficient information about conceptual models to allow LLMs to reason about requests for creating models which use established languages as well as ones in newly designed languages, which are explained in a prompt at runtime using fewshot learning (Fill et al. 2024). Since then, many experiments have been conducted for determining the best approaches for generating conceptual models (e.g., Calamo et al. 2025; Reinhartz-Berger et al. 2025; Safan and Köpke 2025; Köpke and Safan 2024; Muff and Fill 2024, 2025; Kolev et al. 2025; Klievtsova et al. 2025). The scope of investigated modeling languages until now includes well-established languages such as the Unified Modeling Language (UML), Entity-Relationship (ER) diagrams, or business process models in BPMN notation, as well as newly developed and domain-specific languages such as Heraklit or ARWFML (Reinhartz-Berger et al. 2025; Fill et al. 2023; Muff and Fill 2025; Baumann et al. 2024).
From the viewpoint of conceptual modeling, the phenomenon of generative AI has several implications, which touch both upon the way conceptual models are technically created as well as in terms of how humans conceptualize the world and interact with AI. The mere generation of conceptual models from textual input has been explored in the field of natural language processing (NLP) for quite some time (e.g., Bellan et al. 2023; van der Aa et al. 2018). In this regard, the use of LLMs for model creation may be viewed as another tool that surpasses the performance and quality of previous approaches and does not require specific training or configurations. However, LLMs' ease of use and foundational nature allow them to be applied to many different formal and semi-formal languages and tasks, contributing to their disruptive potential (Buchmann et al. 2024; Storey et al. 2025). This has implications for the practice of information systems engineering. Requirements for systems do not need to be manually elicited from textual descriptions and then translated into conceptual models. Rather, LLMs can generate a preliminary version of potential requirements and their formalization, which is subsequently refined by human actors as needed (Ronanki et al. 2024). Further, the advancement of LLMs has enabled them to perform well on agentic tasks. Thereby, their capabilities in terms of generating code allow for the derivation of actions, e.g., via calls to APIs, thus realizing agentic workflows (Fettke et al. 2025). From the viewpoint of teaching, the use of LLMs has many implications that are not yet fully understood. Some in the conceptual modeling community claim that education on conceptual modeling has to shift from model creation to the critical evaluation of models (Snoeck and Pastor 2025). Others, however, put the focus on conceptual models as one potential interface to GenAI systems where both humans and AI systems are forced to formalize their thoughts in the form of conceptual models in order to better understand each other and more easily detect errors in their conceptualizations (Fill et al. 2024).
Lastly, a major open issue is the evaluation of the output of LLMs when generating or interpreting conceptual models. Today, this is typically checked against some baseline in the form of ideal models created by human modelers (e.g., Calamo et al. 2025). Given, however, the fact that conceptual modeling is a cognitive effort to structure the world as some subject or a group of subjects perceive it (Wilmont et al. 2013), the question remains whether such baselines are indeed valid for evaluations as this would implicitly assume the existence of a single “ground truth” representation of a domain. The value of generated conceptual models may thus need to be evaluated in the future from multiple dimensions other than strict conformance metrics. A step in this direction may be the use of quality frameworks for models, as for example proposed by Krogstie (2012), that take a holistic view.
