Named Entities in Medical Case Reports: Corpus and Experiments

Sarah Schulz, Jurica Seva, Samuel Rodriguez, Malte Ostendorff, Georg Rehm

In: Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Christopher Cieri, Khalid Choukri, Thierry Declerck, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis (Hrsg.). Proceedings of the 12th Language Resources and Evaluation Conference (LREC 2020). International Conference on Language Resources and Evaluation (LREC-2020) Marseille, France Seiten 4497-4502 ISBN 979-10-95546-34-4 European Language Resources Association (ELRA) 5/2020.


We present a new corpus comprising annotations of medical entities in case reports, originating from PubMed Central’s open access library. In the case reports, we annotate cases, conditions, findings, factors and negation modifiers. Moreover, where applicable, we annotate relations between these entities. As such, this is the first corpus of this kind made available to the scientific community in English. It enables the initial investigation of automatic information extraction from case reports through tasks like Named Entity Recognition, Relation Extraction and (sentence/paragraph) relevance detection. Additionally, we present four strong baseline systems for the detection of medical entities made available through the annotated dataset.


Weitere Links

LREC-2020-Schulz-et-al-final.pdf (pdf, 308 KB )

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence