Do not Rely on Relay Translations: Multilingual Parallel Direct Europarl

Kwabena Amponsah-Kaakyire, Daria Pylypenko, Cristina España-Bonet, Josef van Genabith

In: 23rd Nordic Conference on Computational Linguistics. Workshop on Modelling Translation: Translatology in the Digital Age (MoTra-2021) May 31-June 2 Virtual Iceland Pages 1-7 Linköping Electronic Conference Proceedings Association for Computational Linguistics 5/2021.


Translationese data is a scarce and valuable resource. Traditionally, the proceedings of the European Parliament have been used for studying translationese phenomena since their metadata allows to distinguish between original and translated texts. However, translations are not always direct and we hypothesise that a pivot (also called ”relay”) language might alter the conclusions on translationese effects. In this work, we (i) isolate translations that have been done without an intermediate language in the Europarl proceedings from those that might have used a pivot language, and (ii) build comparable and parallel corpora with data aligned across multiple languages that therefore can be used for both machine translation and translation studies.

