Quality estimation-guided data selection for domain adaptation of smt

Pratyush Banerjee, Raphael Rubino, Johann Roturier, Josef van Genabith

In: Machine Translation (MT) Pages 101-108 Springer 2014.


Supplementary data selection is a strongly motivated approach in domain adaptation of statistical machine translation systems. In this paper we report a novel approach of data selection guided by automatic quality estimation. In contrast to the conventional approach of using the entire target-domain data as reference for data selection, we restrict the reference set only to sentences poorly translated by the baseline model. Automatic quality estimation is used to identify such poorly translated sentences in the target domain. Our experiments reveal that this approach provides statistically significant improvements over the unadapted baseline and achieves comparable scores to that of conventional data selection approaches with significantly smaller amounts of selected data.

Weitere Links

German Research Center for Artificial Intelligence
Deutsches Forschungszentrum für Künstliche Intelligenz