Publication
A Stacked Sub-Word Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging
Weiwei Sun
In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL HLT 2011). Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT-2011), June 19-24, Portland, Oregon, USA, Pages 1385-1394, ACL, 6/2011.
Abstract
The large combined search space of joint word
segmentation and Part-of-Speech (POS) tagging
makes efficient decoding very hard. As a
result, effective high order features representing
rich contexts are inconvenient to use. In
this work, we propose a novel stacked subword
model for this task, concerning both efficiency
and effectiveness. Our solution is
a two step process. First, one word-based
segmenter, one character-based segmenter and
one local character classifier are trained to produce
coarse segmentation and POS information.
Second, the outputs of the three predictors
are merged into sub-word sequences,
which are further bracketed and labeled with
POS tags by a fine-grained sub-word tagger.
The coarse-to-fine search scheme is efficient,
while in the sub-word tagging step rich
contextual features can be approximately derived.
Evaluation on the Penn Chinese Treebank
shows that our model yields improvements
over the best system reported in the literature.