Viewing a single comment thread. View all comments

shadowplumber t1_j22sf1a wrote

I did some research a while ago with translation out of English into three different languages: Spanish, Arabic, and Japanese. We had groups of translators translate the same six texts into their languages, and then we also translated the six texts with groups of machine translation systems for each language pair. We found that the groups of translators tended to translate something into their language from English in increasingly diverse ways the more “distant” a language got from English (Spanish being closest, then Arabic, and then Japanese being the most “distant”), meaning, for example, let’s say 6 out of 20 Spanish translators translated a word differently into Spanish, but then 10 out of 20 Arabic translators translated the same word differently, and maybe 14 out of 20 Japanese translators translated the same word differently (things were obviously more messy than this but there were clear statistical patterns).

The crazy thing is that the groups of machine translation systems followed the same pattern. Those machine translation systems (neural networks) were trained on tons of existing translations and show evidence of a pattern in translator behavior on a very large scale. It was hard to compare our results with existing research (linguistic “distance” is a slippery concept that I really only saw addressed in one large-scale study; I’ll try to find this study later tomorrow to put on here), but I feel like our work was an empirical estimate or indication of the relative distance of several languages from one language (not from each other but just from that one other language, in this case English).

3