Publication
Spiralling towards perfection: an incremental approach for mutual lexicon-tagger improvement
Karlheinz Moerth; Stephan Procházka; Omar Siam; Thierry Declerck
In: Jelena Kallas; Iztok Kosem (Hrsg.). Proceedings of eLex 2013. Biennial Conference on Electronic Lexicography (eLex-13), located at Electronic Lexicography in the 21st Century: Thinking outside the Paper, October 17-19, Tallinn, Estonia, trojina, Ljubljana, 10/2013.
Abstract
Our paper describes an experiment in which four different digital language resources are used to incrementally create added value in one another. The resources are a digital dictionary, a morphological analyser, a tagger and a digital corpus. We will show how the dictionary is used to improve the tagger, how the tagger is used to annotate a collaboratively produced digital text collection, i.e. the Egyptian language Wikipedia, thus improving easily available open data and lastly how the results of the annotation process are in turn utilised to enhance and improve the dictionary. The paper touches on several issues related to the particular tasks involved in the process: we discuss problems of dealing with data retrieved from the internet, we give details on the lemmatisation, the creation of word-class information and the generation of frequency data from the corpus and we touch on issues of dictionary creation and aspects of the dictionary-corpus-interface. A final topic are standards for the representation of the statistical information in the digital dictionary.