Publication

Few-Shot Whole Slide Pathology Classification with Multi-Granular Vision-Language Models

Anh-Tien Nguyen; Ho Minh Duy Nguyen; Nghiem Tuong Diep; Trung Nguyen; Nhat Ho; Jacqueline Michelle Metsch; Miriam Cindy Maurer; Daniel Sonntag; Hanibal Bohnenberger; Anne-Christin Hauschild

In: ICLR 2025 Workshop on Foundation Models in the Wild. International Conference on Learning Representations (ICLR-2025), OpenReview, 2025.

Abstract

In this study, we propose a novel architecture for a large vision-language model adapted with a multi-granular prompt learning method to advance few-shot pathol- ogy classification. Starting with the Prov-GigaPath foundation model - pre-trained on 1.3 billion pathology image patches - we extend it into a vision-language model by adding adaptors and aligning it with medical text encoders via contrastive learning on 923K image-text pairs. In contrast to previous approaches that combine prompts with frozen features using prefix embeddings or self-attention, our multi- granular attention mechanism evaluates interactions between learnable prompts, individual image patches, and patch groups, capturing both fine details and broader context. We further improve the precision with an unbalanced optimal transport- based visual-text distance that mitigates perturbations from data augmentation. Experiments on lung and kidney pathology imaging modalities show that our method outperforms state-of-the-art competitors and improves performance across various architectures, including CLIP, PLIP, and the Prov-GigaPath integrated PLIP.

Few-Shot Whole Slide Pathology Classification with Multi-Granular Vision-Language Models

Abstract

Projects

More links