Project | Accurat

Duration: 01/01/2010 - 06/30/2012

Analysis and Evaluation of Comparable Corpora for Under-Resourced Areas of Machine Translation

The project aims at researching methods and techniques to overcome one of the central problems of machine translation (MT) – the lack of linguistic resources such as training data for under-resourced areas of machine translation. The main goal is to find, analyze and evaluate novel methods that exploit comparable corpora on order to compensate for the shortage of linguistic resources, and ultimately to significantly improve MT quality for under-resourced languages and narrow domains. Models generated from comparable corpora will be compared against baseline models generated from parallel corpora.

Partners

Tilde, LV (Coordinator)
University of Sheffield, UK
University of Leeds, School of Modern Languages and Cultures, Centre for Translation Studies, UK
Institute for Language and Speech Processing, GR
University of Zagreb, HR
German Research Center for Artificial Intelligence, Language Technology Lab, DE
Research Institute for AI, Romanian Academy, Romania
Linguatec, Germany
Zemanta, Slovenia

Keyfacts

Involved research areas

Multilinguality and Language Technology

Website

http://www.accurat-project.eu/

Publications

All publications

Hybrid Parallel Sentence Mining from Comparable Corpora
Sabine Hunsicker; Radu Ion; Dan Stefanescu
In: Proceedings of the 16th Annual Conference of the European Association for Machine Translation. Annual Conference of the European Association for Machine Translation (EAMT-12), May 28-30, Trento, Italy, 2012.
Generating Virtual Parallel Corpus: A Compatibility Centric Method
Jia Xu; Weiwei Sun
In: MT Summit XIII. Machine Translation Summit (MT Summit-11), 13. September 19-23, Xiaman, China, NA, Xiamen, 9/2011.
Parallel Corpus Refinement as an Outlier Detection Algorithm
Kaveh Taghipour; Shahram Khadivi; Jia Xu
In: MT Summit XIII. Machine Translation Summit (MT Summit-11), 13. September 19-23, Xiamen, China, NA, Xiamen, 9/2011.

Project | Accurat

Analysis and Evaluation of Comparable Corpora for Under-Resourced Areas of Machine Translation

Partners

Keyfacts

Involved research areas

Website

Publications

Hybrid Parallel Sentence Mining from Comparable Corpora

Generating Virtual Parallel Corpus: A Compatibility Centric Method

Parallel Corpus Refinement as an Outlier Detection Algorithm

Funding Authorities

EU - European Union

Partners

Share project:

Keyfacts

Involved research areas

Website

Hybrid Parallel Sentence Mining from Comparable Corpora

Generating Virtual Parallel Corpus: A Compatibility Centric Method

Parallel Corpus Refinement as an Outlier Detection Algorithm

Funding Authorities

EU - European Union