Publikation
Modeling Diachronic Change in Scientific Writing with Information Density
Raphael Rubino; Stefania Degaetano-Ortlieb; Elke Teich; Josef van Genabith
In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. International Conference on Computational Linguistics (COLING-2016), 26th, December 11-16, Osaka, Japan, 2016.
Zusammenfassung
Previous linguistic research on scientific writing has shown that language use in the
scientific domain varies considerably in register and style over time. In this paper we
investigate the introduction of information theory inspired features to study long term
diachronic change on three levels: lexis, part-of-speech and syntax. Our approach is based
on distinguishing between sentences from 19th and 20th century scientific abstracts using
supervised classification models. To the best of our knowledge, the introduction of
information theoretic features to this task is novel. We show that these features outperform
more traditional features, such as token or character n-grams, while leading to more
compact models. We present a detailed analysis of feature informativeness in order to gain a
better understanding of diachronic change on different linguistic levels.