
Modern NLP models and LLMs have specific flaws, despite being highly performant: First, they are black boxes: Parameters of proprietary models are not accessible at all; and even non-proprietary models are largely opaque in the sense that it is unclear where exactly specific knowledge is encoded in potentially billions of parameters. Second, there is a tendency to always increase the size of LLMs and training data to improve performance, which is especially problematic for domains or languages with fewer resources.
The E&E group of DFKI’s Research Department Multilinguality and Language Technology works on transparent and efficient NLP models. Our objective is to make the parameters and behaviour of LLMs more explainable and understandable to both end users and researchers. We try to improve LLMs with regard to data consumption, e.g. for domains or languages where data is scarce, by using structured data, new learning techniques, or other modalities; and in terms of model size, e.g. for settings where powerful hardware is not available.
We are involved in Twinning projects, where we provide knowledge transfer both on research topics and project management to newly established research institutions across Europe. We are involved in European procurement projects focusing on language resources, such as the European Language Resource Coordination and the Language Data Space.

GenSeC – Generative AI in a Security Context
GenSeC investigates how generative foundational models can be evaluated in security-relevant operational contexts where standard assumptions about clear tasks, stable ground truths, and harmless inputs do not apply. Instead, such environments are often characterized by incomplete, multilingual, time-critical, and potentially manipulated information. GenSeC is based on the premise that evaluation methods must explicitly reflect these conditions in order to be meaningful.
We are developing a larger AI language model that will be made available to the economy and society as open source. Based on a large language model (LLM), a so-called reasoning model will also be created using special procedures to increase the quality of the overall system and optimise resource consumption. In addition, initial use cases are to be implemented using AI agent technologies.
The main objective of the lorAI project is to upgrade the Kempelen Institute of Intelligent Technologies (KInIT) to a leading R&I institution in low resource artificial intelligence (LRAI) in Slovakia and Europe.
Duration: 08/01/2024 - 07/31/2027
In TRAILS we focus on three main research directions: (i) inclusion of underrepresented languages and cultures through multilingual and culturally sensitive NLP, (ii) robustness and fairness with respect to long-tail phenomena and classes and "trustworthy content", and (iii) robust and efficient NLP models that enable training and deployment of models for (i) and (ii). We also partially address economic inequality by aiming for more efficient models (objective (iii)), which directly translates into a lower resource/cost footprint.
Team Lead:
Dr. Simon Ostermann
simon.ostermann@dfki.de
Team Members:
Yusser al Ghussin
Tatiana Anikina
Tanja Bäumel
Daniil Gurgurov
Cennet Oguz
Stefania Racioppa
MSc Students and Research Assistants:
Khondoker Ittehadul Islam
Hyun Gu Kang
Eva Gavaller
Kaviya Ravichandran
Amelie Seyfried
Arushi Singhal