Publikation

Critical Evaluation of Biologically Informed Neural Networks: Validating Biological Pathway Representation

Tanya Amit Tyagi

Mastersthesis, University of Saarland, 2026.

Zusammenfassung

High-dimensional omics data are widely used for biomarker discovery, but many predictive models struggle with interpretability and reproducibility. Standard machine learning and deep learning methods can achieve good performance, yet their predictions are often difficult to relate to known biological mechanisms. Biologically Informed Neural Networks (BINNs) aim to address this limitation by embedding curated biological pathway information directly into the model architecture, allowing predictions to be traced to genes, proteins, and pathways. This thesis aims to critically evaluate whether BINNs provide practical ad vantages over conventional machine learning models in single-omics biomarker discovery. Specifically, the study assesses whether biologically constrained architectures can preserve predictive performance while offering structured and interpretable representations of disease-related signals. BINNs are evaluated across three distinct omics modalities: plasma proteomics, mRNA expression, and microRNA expression, covering septic acute kidney injury and two cancer cohorts. Their performance is compared against baseline models, including random forests, fully connected neural networks, and Bayesian hierarchical logistic regression. This thesis examines predictive accuracy, generalisation behaviour, and pathway-level interpretability across datasets by using nested cross-validation and consistent evaluation metrics. The results show that BINNs can achieve competitive performance in settings with strong biological signals, particularly in proteomics, while offering transparent pathway-level insights. However, their advantages are less consistent in noisier or more heterogeneous transcriptomic and microrna datasets. Overall, the findings highlight both the strengths and limitations of biologically informed neural architectures and highlight the conditions under which they are most useful for biomarker discovery.

Projekte

CurAISciD - curATime AI science and development