A Dataset of Offensive German Language Tweets Annotated for Speech Acts

Melina Plakidis; Georg Rehm

In: Nicoletta Calzolari; Frédéric Béchet; Philippe Blache; Christopher Cieri; Khalid Choukri; Thierry Declerck; Hitoshi Isahara; Bente Maegaard; Joseph Mariani; Jan Odijk; Stelios Piperidis (Hrsg.). Proceedings of the 13th Language Resources and Evaluation Conference (LREC 2022). International Conference on Language Resources and Evaluation (LREC-2022), Marseille, France, Pages 4799-4807, European Language Resources Association (ELRA), 6/2022.


We present a dataset consisting of German offensive and non-offensive tweets, annotated for speech acts. These 600 tweets are a subset of the dataset by Struß et al. (2019) and comprises three levels of annotation, i.e., six coarse-grained speech acts, 23 fine-grained speech acts and 14 different sentence types. Furthermore, we provide an evaluation in both qualitative and quantitative terms. The dataset is made publicly available under a CC-BY-4.0 license.


A_Dataset_of_Offensive_German_Language_Tweets_Annotated_for_Speech_Acts.pdf (pdf, 224 KB )

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence