Publikation
Bootstrapped Extraction of Index Terms from Normalized User-Generated Content
Piroska Lendvai; Thierry Declerck
In: Michael Beißwenger; Torsten Zesch (Hrsg.). Proceedings of the 2nd Workshop on Natural Language Processing for Computer-mediated Communication and Social Media. Workshop on Natural Language Processing for Computer-mediated Communication and Social Media (NLP4CMC-15), located at International Conference of the German Society for Computational Linguistics and Language - GSCL 2015, September 29, Essen - Duisburg, Germany, Pages 44-48, GSCL, 2015.
Zusammenfassung
We report on the extraction of key phrases
for news events, based on string alignment
between social media posts and user-linked
web documents. Hashtag normalization is
tested for enhancing string similarity, while
both token-based tweet similarity and manual
event annotations are tested for transferring
web links to posts that do not refer to
external documents. We are able to identify
more terms via web link transfer compared
to no link transfer, and obtain syntactically
and semantically more complex terms compared
to general document-based term extraction.