“Poisonous and evil rubbish”. “Pregnant woman over 70 lounge”. “Slip and fall carefully”. Such semantically and syntactically erroneous sentences were not taken from a practice sheet in an English language classroom, but extracted from machine translation (MT) software and published on street signs in East Asia.
Incorrect automated translations of the like trigger either raised eyebrows or giggles of ridicule, so often that meme pages have been established with the sole purpose of mocking humorous translation failures around the world (a culturally ignorant practice, but that’s an argument for another day). Beyond the superficial laughter, however, we should still concern ourselves with the issue of machine translation. In an increasingly globalised era, machine translations are destined to play a pivotal role in cross-cultural communication for generations to come. Today, while machines generally produce satisfactory translations for typologically-related language pairs (e.g. Norwegian to Swedish), problems often emerge with linguistically distant language pairs (e.g. English to Japanese). So how are scientists working to improve the latter type?
To answer that, it is first necessary to understand how machine translation functions, as well as its evolutionary path. Initially coined as computer-assisted language processing, machine translation has taken on multiple forms over the decades, each adopting a different approach to processing input and producing output. For an English sentence as simple as “The women speak with the principal”, a traditional rule-based machine translation starts with an analysis of morphosyntax (i.e. word and sentence structure). It first recognises the subject-predicate (“the women” vs. “speak with the principal”) and other key grammatical information (particularity “the” and plurality of “women”). Afterwards, it processes the semantics of the input by interpreting what each individual word means in context (is “principal” here a noun as in “school headmaster”? Or an adjective meaning “primary”?) and finally translates the interpreted input into the target language.
Read more: Varsity