A Fully Coreference-annotated Corpus of Scholarly Papers from the ACL Anthology

Ulrich Schäfer; Christian Spurk; Jörg Steffen

In: Proceedings of the 24th International Conference on Computational Linguistics. International Conference on Computational Linguistics (COLING-2012), December 10-14, Mumbai, India, Pages 1059-1070, ICCL, 12/2012.


We describe a large coreference annotation task performed on a corpus of 266 papers from the ACL Anthology, a publicly, electronically available collection of scientific papers in the domain of computational linguistics and language technology. The annotation comprises mainly noun phrase coreference of the full textual content of each paper in the Anthology subset. It has been performed carefully and at least twice for each paper (initial annotation and secondary correction phase). The purpose of this paper is to summarize the comprehensive annotation schema and release the corpus publicly, along with this paper. The corpus is by far larger than the ACE coreference corpora. It can be used to train coreference resolution systems in the Computational Linguistics and Language Technology domain for semantic search, taxonomy extraction, question answering, citation analysis, scientific discourse analysis, etc.


Weitere Links

German Research Center for Artificial Intelligence
Deutsches Forschungszentrum für Künstliche Intelligenz