Publikation

Rethinking Semantic Segmentation for Table Structure Recognition in Documents

Muhammad Shoaib Ahmed Siddiqui; Pervaiz Khan; Andreas Dengel; Sheraz Ahmed

In: Proceedings - The 15th IAPR International Conference on Document Analysis and Recognition. International Conference on Document Analysis and Recognition (ICDAR-2019), September 20-25, Sydney, Australia, Pages 1397-1402, ISBN 978-1-7281-3015-6, IEEE, 2019.

Zusammenfassung

Based on the recent advancements in the domain of semantic segmentation, Fully-Convolutional Networks (FCN) have been successfully applied for the task of table structure recognition in the past. We analyze the efficacy of semantic segmentation networks for this purpose and simplify the problem by proposing prediction tiling based on the consistency assumption which holds for tabular structures. For an image of dimensions H × W, we predict a single column for the rows (ŷ row ϵ H) and a predict a single row for the columns (ŷ row ϵ W). We use a dual-headed architecture where initial feature maps (from the encoder-decoder model) are shared while the last two layers generate class specific (row/column) predictions. This allows us to generate predictions using a single model for both rows and columns simultaneously, where previous methods relied on two separate models for inference. With the proposed method, we were able to achieve state-of-the-art results on ICDAR-13 image-based table structure recognition dataset with an average F-Measure of 92.39% (91.90% and 92.88% F-Measure for rows and columns respectively). With the proposed method, we were able to achieve state-of-the-art results on ICDAR-13. The obtained results advocate that constraining the problem space in the case of FCN by imposing valid constraints can lead to significant performance gains.

Weitere Links

https://ieeexplore.ieee.org/document/8978088/