Publikation
Background Variability Modeling for Statistical Layout Analysis
Faisal Shafait; Joost van Beusekom; Daniel Keysers; Thomas Breuel
In: Proceedings of the 19th International Conference on Pattern Recognition. International Conference on Pattern Recognition (ICPR-2008), December 8-11, Tampa, Florida, USA, IEEE, 2008.
Zusammenfassung
Geometric layout analysis plays an important role in
document image understanding. Many algorithms known in
literature work well on standard document images, achiev-
ing high text line segmentation accuracy on the UW-III
dataset. These algorithms rely on certain assumptions
about document layouts, and fail when their underlying as-
sumptions are not met. Also, they do not provide confidence
scores for their output. These two problems limit the use-
fulness of general purpose layout analysis methods in large
scale applications. In this contribution, we propose a sta-
tistically motivated model-based trainable layout analysis
system that allows assumption-free adaptation to different
layout types and produces likelihood estimates of the cor-
rectness of the computed page segmentation. The perfor-
mance of our approach is tested on a subset of the Google
1000 books dataset where it achieved a text line segmen-
tation accuracy of 98.4% on layouts where other general-
purpose algorithms failed to do a correct segmentation.