Publication
Multimodal approach for imbalanced document classification
Mohammad Minouei; Reza Soheili; Didier Stricker
In: Wolfgang Osten (Hrsg.). Seventeenth International Conference on Machine Vision (ICMV 2024). International Conference on Machine Vision (ICMV-2024), October 10-13, Edinburg, United Kingdom, Pages 347-358, SPIE Conference Proceedings, Vol. 13517, SPIE, 2024.
Abstract
The issue of data scarcity in deep learning remains a significant, unresolved problem. Many existing works in this domain operate under the assumption of models having access to a comprehensive and balanced dataset that covers all conceivable class conditions. However, real-world scenarios often involve imbalanced and incomplete data, creating considerable challenges. In this study, we address the problem of document image classification in the context of imbalanced data.
We employ a strategy that merges effective techniques for managing data imbalance with a multi-modal approach that incorporates both image and text data. The experiments were carried out using a customized version of the RVL-CDIP benchmark, where we compared our approach against other methods. The results demonstrate substantial performance enhancements, including an overall accuracy boost of 13 percent and more than a 40 percent improvement in certain minority classes. Our research highlights the efficacy of tailored methods in overcoming the difficulties of imbalanced document classification.
