Recognition of Handwritten Historical Documents: HMM-Adaptation vs. Writer Specific Training

E. Indermühle, Marcus Liwicki, Horst Bunke

In: Proc. 11th Int. Conference on Frontiers in Handwriting Recognition. International Conference on Frontiers in Handwriting Recognition (ICFHR-2008) August 19-21 Montreal QC Canada Pages 186-191 2008.


In this paper we propose a recognition system for handwritten manuscripts by writers of the 20th century. The proposed system first applies some preprocessing steps to remove background noise. Next the pages are segmented into individual text lines. After normalization a hidden Markov model based recognizer, supported by a language model, is applied to each text line. In our experiments we investigate two approaches for training the recognition system. The first approach consists in training the recognizer directly from scratch, while the second adapts it from a recognizer previously trained on a large general off-line handwriting database. The second approach is unconventional in the sense that the language of the texts used for training is different from that used for testing. In our experiments with several training sets of increasing size we found that the overall best strategy is adapting the previously trained recognizer on a writer specific data set of medium size. The final word recognition accuracy obtained with this training strategy is about 80%.


German Research Center for Artificial Intelligence
Deutsches Forschungszentrum für Künstliche Intelligenz