Feedback Learning: Automating the Process of Correcting and Completing the Extracted Information

Khurram Azeem Hashmi, Rakshith Bymana Ponnappa, Syed Saqib Bukhari, Martin Jenckel, Andreas Dengel

In: International Conference on Document Analysis and Recognition. International Conference on Document Analysis and Recognition Workshops (ICDARW) September 22-25 Sydney NSW Australia ISBN 978-1-7281-5054-3 IEEE 9/2019.


In recent years, with the increasing usage of digital media and advancements in deep learning architectures, most of the paper-based documents have been revolutionized into digital versions. These advancements have helped state-of-the-art information extraction and digital mailroom technologies become progressively efficient. Even though many efficient post-Information Extraction (IE) error rectification methods have been introduced in the recent past to improve the quality of digitized documents. They are still imperfect and they demand improvements in the area of context-based error correction, specifically when we are dealing with the documents involving sensitive information such as invoices. This paper describes the self-correction approach based on the sequence to sequence Neural Machine Translation (NMT) as applied to rectify the incorrectness in the results of any information extraction approach such as Optical Character Recognition (OCR). We accomplished this approach by exploiting the concepts of sequence learning with the help of feedback provided during each cycle of training. Finally, we have compared state-of-the-art post-OCR error correction methods with our feedback learning approach. Our empirical results have outperformed state-of-the-art post-OCR error correction methods.

Feedback_Learning_Automating_the_Process_of_Correcting_and_Completing_the_Extracted_Information.pdf (pdf, 505 KB )

German Research Center for Artificial Intelligence
Deutsches Forschungszentrum für Künstliche Intelligenz