Publication
OCR Error Correction: State-of-the-art vs An NMT Based Approach
Kareem Mokhtar; Syed Saqib Bukhari; Andreas Dengel
In: DAS. IAPR International Workshop on Document Analysis Systems (DAS-2018), April 24-27, Vienna, Austria, IEEE, 2018.
Abstract
Although the performance of the state-of-the-art
OCR systems is very high, they can still introduce errors due
to various reasons. and When it comes to historical documents
with old manusrips the preformance of such systems gets even
worse. That is why Post-OCR error correction has been an open
problem for many years. Many state-of-the-art approaches have
been introduced thorough the recent years.
This paper contributes to the field of Post-OCR Error Cor-
rection by introducing two Novel deep learning approaches to
improve the accuracy of OCR systems, and a post processing
technique that can further enhance the quality of the output
results. These approaches are based on Neural Machine Transla-
tion and were motivated by the great success that deep learning
introduced to the field of Natural Language Processing. Finally,
we will compare the state-of-the-art approaches in Post-OCR
Error Correction with the newly introduced systems and discuss
the results.