Skip to main content Skip to main navigation



Deep Fusion für Neuronale Netze

Deep Fusion für Neuronale Netze


Focused research and development in the field of Deep Learning, especially in recent years, has resulted in a variety of different network types, architectures, modules, training methods and data sets. Nevertheless, it is still a challenging task to build and configure a state-of-the-art Deep Learning System for certain visual recognition tasks. Typically, such a task starts with choosing an appropriate network architecture and network parameters and continues with the challenging exploitation of the multi-modal nature of input data, which in many cases contains not only visual information, but also sound and motion or text. Although deep learning approaches offer significant improvements in processing individual modalities, it is desirable to use all available modalities in real-world systems. The end-to-end approach of Deep Learning combines feature extraction and classification into a single step, which is why the traditional fusion concepts (Early Fusion, Late Fusion) need to be revisited. Currently, one of the questions that arises is: How and when are multiple modalities of an input signal (e.g., containing video data, visual, motion and acoustic information, text or knowledge) merged? Furthermore, it is currently completely open how static or dynamic external context information (e.g. special domain knowledge or eye-tracking information) can be fed to deep learning systems.

Project content

DeFuseNN focuses on three challenges for Deep Learning and defines the following areas of activity to address them:

  1. "Building a knowledge base" to improve the understanding of the Deep Learning landscape
  2. Investigation and development of new "multi-modal fusion concepts" for deep learning
  3. Use of additional "external signals" to improve classification

Building a knowledge base

The goal of this task area is to provide an overview of which tasks can be solved by which types of deep neural networks, and which architectures and their configuration (e.g. layers, training parameters, blocks) are suitable.

Multi-modal fusion

The goal of this task area is to develop fusion approaches that fit the underlying deep neural network architectures. For example, early fusion can often be regarded as already built into CNNs, since they process the R, G, B channels as independent input signals. In contrast to Early Fusion, Late Fusion can be regarded as a combination of two networks of the same network type (e.g. CNN). Here the fully connected layers can serve as the late fusion layer. In addition, new fusion layers that can be placed within the networks will be developed and investigated in the course of the project. These layers can fuse signals from sub-networks during the feed-forward mode (in-fusion). However, combinations can also run in parallel, e.g., the simultaneous processing of sound and image content. Such constellations often require synchronized or interwoven connections.

External Signals

Neural networks process an input by processing the signal through a series of layers. For such compositions, the use of context as an external signal is considered to combine formal knowledge with statistical learning.

Benefits and utilization

The DeFuseNN project will provide results on several current scientific questions in the field of Deep Learning. For the first time, the knowledge base created will provide an overview of the rapidly developing field of research in order to be able to derive solutions from existing problems using deep learning approaches. In addition, novel fusion concepts will be developed and investigated, which make new problems accessible to Deep Learning. In addition, the work in DeFuseNN will form the basis for further research on the understanding of Deep Learning. The results of DeFuseNN are scientifically shared through publications at renowned conferences and collaborations with other scientific institutes. Economically, the work in DeFuseNN provides opportunities for utilization with established national and international cooperation partners by transferring the basic concepts into marketable solutions.


BMBF - Federal Ministry of Education and Research


Bundesministerium für Bildung und Forschung

BMBF - Federal Ministry of Education and Research


DeFuseNN defines three main working areas: the Deep Learning Knowledge Repository, Multi-modal fusion and using context as a signal

Publications about the project

Stanislav Frolov; Tobias Hinz; Federico Raue; Jörn Hees; Andreas Dengel

In: Neural Networks, Vol. 144, Pages 187-209, Journal of Neural Network, Elsevier, 12/2021.

To the publication

Stanislav Frolov; Shailza Jolly; Jörn Hees; Andreas Dengel

In: Proceedings of the Second Workshop on Beyond Vision and LANguage: inTEgrating Real-world kNowledge (LANTERN). International Conference on Computational Linguistics (COLING-2020), 28th COLING, December 13, Online-Conference, Association for Computational Linguistics, Barcelona, Spain, 12/2020.

To the publication

Adriano Lucieri; Muhammad Naseer Bajwa; Andreas Dengel; Sheraz Ahmed

In: Proceedings of the 27th International Conference on Neural Information Processing (ICONIP2020). International Conference on Neural Information Processing (ICONIP-2020), November 18-22, Bangkok, Thailand, LNCS, Springer, 11/2020.

To the publication