Publication
Computer Understanding of Document Structure
Andreas Dengel; Frank Dubiel
In: International Journal of Imaging Systems and Technology (IJIST), Vol. 7, No. 4, Pages 271-278, 1996.
Abstract
We describe a system which is capable of learning the presentation of document logical structure, exemplary as shown for business letters. Presenting a set of instances to the system, it clusters them into structural concepts and induces a concept hierarchy. This concept hierarchy is taken as a reference for classifying future input. The article introduces the sequence of learning steps and describes how the resulting concept hierarchy is applied to logical labeling, and reports the results.