Publication
SentiMerge: Combining Sentiment Lexicons in a Bayesian Framework
Guy Emerson; Thierry Declerck
In: Proceedings of the Workshop on Lexical and Grammatical Resources for Language Processing. Workshop on Lexical and Grammatical Resources for Language Processing (LG-LP-14), located at 25th International Conference on Computational Linguistics, August 24, Dublin, Ireland, The COLING 2014 Organizing Committe, Dublin, 8/2014.
Abstract
Many approaches to sentiment analysis rely on a lexicon that labels words with a prior polarity.
This is particularly true for languages other than English, where labelled training data is not
easily available. Existing efforts to produce such lexicons exist, and to avoid duplicated effort, a
principled way to combine multiple resources is required. In this paper, we introduce a Bayesian
probabilistic model, which can simultaneously combine polarity scores from several data sources
and estimate the quality of each source. We apply this algorithm to a set of four German sentiment
lexicons, to produce the SentiMerge lexicon, which we make publically available. In a simple
classification task, we show that this lexicon outperforms each of the underlying resources, as
well as a majority vote model.