Publikation

Robust Deep Linguistic Processing

Yi Zhang

PhD-Thesis, Saarland University, 2007.

Zusammenfassung

This dissertation deals with the robustness problem of deep linguistic processing. Hand-crafted deep linguistic grammars provide precise modeling of human languages, but are deficient in their capability of handling ill-formed or extra-grammatical inputs. In this dissertation, we argue that with a series of robust processing techniques, improved coverage can be achieved without sacrificing efficiency or specificity of deep linguistic processing.

An overview of the robustness problem in state-of-the-art deep linguistic processing systems reveals that insufficient lexicon and ver-restricted constructions are the major sources for the lack of robustness. Targeting both, several robust processing techniques are proposed as add-on modules to the existing deep processing systems.

For the lexicon, we propose a deep lexical acquisition model to achieve automatic online detection and acquisition of missing lexical entries. The model is further extended for acquiring multiword expressions which are syntactically and/or semantically idiosyncratic. The evaluation shows that our lexical acquisition results significantly improved grammar coverage without noticeable degradation in accuracy.

For the constructions, we propose the partial parsing strategy to maximally recover the intermediate results when the full analysis is not available. Partial parse selection models are proposed and evaluated. Experiment results show that the fragment semantic outputs ecovered from the partial parses are of good quality and high value for practical usage. Also, the efficiency issues are carefully addressed with new extensions to the existing efficient processing algorithms.

yizhang-phd-rdlp.pdf (pdf, 2 MB )