-
Essay / Evaluating translations produced by Amazon Mechanical...
Summary We are investigating the use of Amazon Mechanical Turk for creating translations from English to Haitian Creole. The intention is to produce a bilingual corpus for statistical machine translation. In several experiments, we offer different amounts of money for translation tasks. The current results show that there is no clear correlation between remuneration and translation quality. Almost all translations show significant overlap with online translation tools, indicating that workers often did not translate the sentences themselves.1 IntroductionOur group is currently involved in the development of an English↔Creole translation system Haitian intended for use in the Haiti earthquake region. One of the current tasks is the rapid production of a corpus of bilingual English↔Haitian Creole medical dialogue in the field in order to be able to train a statistical machine translation system. Some native Haitian Creole speakers have volunteered to help with translations and we also intend to bring in professional translators to support this effort. Amazon's Mechanical Turk (AMT) is an interesting alternative here because it would be cheaper than using professional translators. This is particularly relevant for an English↔Haitian Creole translation system because the commercial potential is likely limited. One of the main concerns with using AMT for NLP tasks, particularly translation, is the quality of the resulting data and the availability of workers with knowledge of Haitian Creole. The experiments presented in this article address these concerns and evaluate translations produced by Amazon Mechanical Turk against professionals and unpaid volunteers. We study the overall quality of the translations produced and compare translations done in different locations...... middle of article ......e appears to be reasonably well represented. It will be necessary to confirm the experiments with other translations to have a larger test set. A professional translation will be used as a reference to have a more reliable reference translation for automatic assessments. It would also be interesting to do similar experiments with other more common or rarer language pairs for more in-depth comparisons.ReferencesWinter Mason, Duncan J. Watts. 2009. Financial incentives and “crowd performance.” Proceedings of KDD-HCOMP 2009, Paris, France. Kishore Papineni, Salim Roukos, Todd Ward and Wei-Jing Zhu. 2002. BLEU: an automatic evaluation method for machine translation. Proceedings of the 2002 ACL Conference, Philadelphia, PA.Jason Pontin. 2007. Artificial intelligence, with help from humans, The New York Times, March 25, 2007.