Publikation
Hashtag Processing for Enhanced Clustering of Tweets
Dagmar Gromann; Thierry Declerck
In: Galia Angelova; Kalina Bontcheva; Ruslan Mitkov; Ivelina Nikolova; Irina Temnikova (Hrsg.). Proceedings of the INTERNATIONAL CONFERENCE RECENT ADVANCES IN NATURAL LANGUAGE PROCESSING 2017. Recent Advances in Natural Language Processing (RANLP-17), September 2-8, Varna, Bulgaria, ISBN ISSN 1313-8502, INCOMA Ltd, Shoumen, Bulgaria, 9/2017.
Zusammenfassung
Rich data provided by tweets have been analyzed, clustered, and explored in a variety
of studies. Typically those studies focus on named entity recognition, entity
linking, and entity disambiguation or clustering.Tweets and hashtags are generally
analyzed on sentential or word level but not on a compositional level of concatenated
words. We propose an approach for a closer analysis of compounds in hashtags,
and in the long run also of other types of text sequences in tweets, in order to
enhance the clustering of such text documents. Hashtags have been used before as
primary topic indicators to cluster tweets, however, their segmentation and its effect
on clustering results have not been investigated to the best of our knowledge. Our results
with a standard dataset from the Text REtrieval Conference (TREC) show that
segmented and harmonized hashtags positively impact effective clustering.