Lost in translation: is research into species being missed because of a language barrier?

Valeria Ramírez Castañeda, a Colombian biologist, spends her time in the Amazon studying how snakes eat poisonous frogs without getting ill. Although her findings come in many shapes and sizes, in her years as a researcher, she and her colleagues have struggled to get their biological discoveries out to the wider scientific community. With Spanish as her mother tongue, her research had to be translated into English to be published. That wasn’t always possible because of budget or time constraints –and it means that some of her findings were never published.

“It’s not that I’m a bad scientist,” she says. “It’s just because of the language.”

Ramírez Castañeda is not alone. There is a plethora of research in non-English-language papers that gets lost in translation, or is never translated, creating a gap in the global community’s scientific knowledge. As the amount of scientific research grows, so does the gap. This is especially true for conservation and biodiversity. Research about native traditions and knowledge tied to biodiversity is often conducted in the domestic non-colonial language and isn’t translated.

study published in the journal Plos Biology found that paying more attention to non-English language research could expand the geographical coverage of biodiversity scientific evidence by 12% to 25% and the number of species covered by 5% to 32%. There is research on nine amphibian species, 217 bird species and 64 mammal species not covered in English-language studies. “We are essentially not using scientific evidence published in non-English-languages at the international level, but if we could make a better use of [it], we might be able to fill the existing gaps in the variability of current scientific evidence,” says Tatsuya Amano, a Japanese biodiversity researcher at the University of Queensland and the paper’s lead researcher.

Read more: The Guardian

Rediscovered Medieval Manuscript Offers New Twist on Arthurian Legend

Thirteenth-century manuscript fragments discovered by chance at a library in Bristol, England, have revealed an alternative version of the story of Merlin, the famed wizard of Arthurian legend. A team of scholars translated the writings, known as the Bristol Merlin, from Old French to English and traced the pages’ medieval origins, reports Alison Flood for the Guardian.

The manuscript is part of a group of texts called the Vulgate Cycle, or the Lancelot-Grail Cycle. Using handwriting analysis, the researchers determined that someone in northern or northeastern France wrote the text between 1250 and 1275. That means it was committed to parchment shortly after the Vulgate Cycle was first composed, between 1220 and 1225.

“The medieval Arthurian legends were a bit like the Marvel Universe, in that they constituted a coherent fictional world that had certain rules and a set of well-known characters who appeared and interacted with each other in multiple different stories,” Laura Chuhan Campbell, a medieval language scholar at Durham University, tells Gizmodo’s Isaac Schultz. “This fragment comes from the second volume, which documents the rise of Merlin as Arthur’s advisor, and Arthur’s turbulent early years as king.”

King Arthur first appeared in a history of Britain written in 829 or 830, notes the British Library. That text describes him as a warlord or Christian soldier. Later accounts from the 12th century added new elements to the legend, such as Merlin’s mentorship of Arthur. English writer Thomas Malory compiled one of the best-known collections of the stories, Le Morte d’Arthurin the 15th century.

Read more: Smithsonian

African languages to get more bespoke scientific terms

There’s no original isiZulu word for dinosaur. Germs are called amagciwane, but there are no separate words for viruses or bacteria. A quark is ikhwakhi (pronounced kwa-ki); there is no term for red shift. And researchers and science communicators using the language, which is spoken by more than 14 million people in southern Africa, struggle to agree on words for evolution.

IsiZulu is one of approximately 2,000 languages spoken in Africa. Modern science has ignored the overwhelming majority of these languages, but now a team of researchers from Africa wants to change that.

A research project called Decolonise Science plans to translate 180 scientific papers from the AfricArXiv preprint server into 6 African languages: isiZulu and Northern Sotho from southern Africa; Hausa and Yoruba from West Africa; and Luganda and Amharic from East Africa.

These languages are collectively spoken by around 98 million people. Earlier this month, AfricArXiv called for submissions from authors interested in having their papers considered for translation. The deadline is 20 August.

The translated papers will span many disciplines of science, technology, engineering and mathematics. The project is being supported by the Lacuna Fund, a data-science funder for researchers in low- and middle-income countries. It was launched a year ago by philanthropic and government funders from Europe and North America, and Google.

Read more: Nature

How do machines translate linguistically distant languages?

“Poisonous and evil rubbish”. “Pregnant woman over 70 lounge”. “Slip and fall carefully”. Such semantically and syntactically erroneous sentences were not taken from a practice sheet in an English language classroom, but extracted from machine translation (MT) software and published on street signs in East Asia.

Incorrect automated translations of the like trigger either raised eyebrows or giggles of ridicule, so often that meme pages have been established with the sole purpose of mocking humorous translation failures around the world (a culturally ignorant practice, but that’s an argument for another day). Beyond the superficial laughter, however, we should still concern ourselves with the issue of machine translation. In an increasingly globalised era, machine translations are destined to play a pivotal role in cross-cultural communication for generations to come. Today, while machines generally produce satisfactory translations for typologically-related language pairs (e.g. Norwegian to Swedish), problems often emerge with linguistically distant language pairs (e.g. English to Japanese). So how are scientists working to improve the latter type?

To answer that, it is first necessary to understand how machine translation functions, as well as its evolutionary path. Initially coined as computer-assisted language processing, machine translation has taken on multiple forms over the decades, each adopting a different approach to processing input and producing output. For an English sentence as simple as “The women speak with the principal”, a traditional rule-based machine translation starts with an analysis of morphosyntax (i.e. word and sentence structure). It first recognises the subject-predicate (“the women” vs. “speak with the principal”) and other key grammatical information (particularity “the” and plurality of “women”). Afterwards, it processes the semantics of the input by interpreting what each individual word means in context (is “principal” here a noun as in “school headmaster”? Or an adjective meaning “primary”?) and finally translates the interpreted input into the target language.

Read more: Varsity

Hawaii’s Forgotten Native-Language Newspapers Are a Treasure Trove of Climate Data

There were once more than 100 native language newspapers in circulation in Hawaii that chronicled daily life on the islands. As early as 1834, the newspapers supplied native Hawaiians with news, current affairs, opinion, and, importantly, information about extreme weather events.

In 1871, an intense hurricane struck the islands of Hawaii and Maui, causing catastrophic damage. The newspapers reported on the destruction, traced the likely path of the storm, and documented the impact on Hawaiians.

“The streaming of the wind was similar to 5,000 steam whistles set off at one time,” reported the paper Ke Au Okoa. “The rain continued from morning til night. At 11 o’clock, the waters rushed swiftly and the lowlands were flooded, sweeping everything that was in their paths. The damages were great concerning the koa trees and the grapevines.”

In 1893, a group backed by U.S. troops illegally overthrew Hawaii’s monarchical government and, shortly after, passed a law mandating all schools teach their classes in English. The Hawaiian language fell into decline, and, as a result, the native-language newspapers faded first into obscurity, then completely ceased to exist. Records of the 1871 hurricane were consigned to dusty archives and its devastating impact on the islands all but forgotten by Hawaii’s residents.

But in the early ’90s, Puakea Nogelmeier, PhD, a professor of language at the University of Hawai‘i, discovered that the archipelago’s libraries and museums had hoarded its old newspapers. Realizing their historical and cultural value, he started the painstaking process of translating and digitizing each article.

Read more: Future Human

Machine learning has been used to automatically translate long-lost languages

In 1886, the British archaeologist Arthur Evans came across an ancient stone bearing a curious set of inscriptions in an unknown language. The stone came from the Mediterranean island of Crete, and Evans immediately traveled there to hunt for more evidence. He quickly found numerous stones and tablets bearing similar scripts and dated them from around 1400 BCE.

That made the inscription one of the earliest forms of writing ever discovered. Evans argued that its linear form was clearly derived from rudely scratched line pictures belonging to the infancy of art, thereby establishing its importance in the history of linguistics.

He and others later determined that the stones and tablets were written in two different scripts. The oldest, called Linear A, dates from between 1800 and 1400 BCE, when the island was dominated by the Bronze Age Minoan civilization.

The other script, Linear B, is more recent, appearing only after 1400 BCE, when the island was conquered by Mycenaeans from the Greek mainland.

Evans and others tried for many years to decipher the ancient scripts, but the lost languages resisted all attempts. The problem remained unsolved until 1953, when an amateur linguist named Michael Ventris cracked the code for Linear B.

His solution was built on two decisive breakthroughs. First, Ventris conjectured that many of the repeated words in the Linear B vocabulary were names of places on the island of Crete. That turned out to be correct.

His second breakthrough was to assume that the writing recorded an early form of ancient Greek. That insight immediately allowed him to decipher the rest of the language. In the process, Ventris showed that ancient Greek first appeared in written form many centuries earlier than previously thought.

Ventris’s work was a huge achievement. But the more ancient script, Linear A, has remained one of the great outstanding problems in linguistics to this day.

It’s not hard to imagine that recent advances in machine translation might help. In just a few years, the study of linguistics has been revolutionized by the availability of huge annotated databases, and techniques for getting machines to learn from them. Consequently, machine translation from one language to another has become routine. And although it isn’t perfect, these methods have provided an entirely new way to think about language.

Read more: MIT Technology Review

Hidden meanings: Using artificial intelligence to translate ancient texts

The ancient world is full of mystery. Many mysteries, in fact. Many mysteries indeed.

Who built the monolithic and megalithic structures found all over the world? Why did they build them? How did they build them? What technology did they use?

And perhaps most importantly from the point of view answering all the other questions: Where are the texts that the builders produced?

We assume that if the ancients were capable of building structures that modern humans cannot replicate even now with the latest technology, they must have been a literate civilization which recorded and stored information.

But where is it?

These are among the multitude of questions that have actively and specifically preoccupied archaeologists and historians for more than a century.

A huge amount of progress has been made as a result of the dedicated pursuit of the answers. It has spawned a multibillion-dollar global tourism industry and some relatively well-funded academic projects. A lot of museums and films can also be said to be somewhat beholden to this obsession with the ancient past.

But in terms of definitively answering those big questions, progress has been rather slow and painstaking.

The Rosetta Stone

It would, of course, help if more artifacts like the Rosetta Stone were discovered.

The Rosetta Stone, created in around 200 BC and discovered in the year 1800, is a black stone on which three different languages were written – Egyptian hieroglyphics, Greek, and a more common Egyptian language called Demotic.

This stone enabled people studying ancient cultures to finally understand the Egyptian hieroglyphics which cover acres of surface area on pyramids and temples in the country.

The presumption is made that the three statements on the Rosetta Stone are direct and literal translations of each other, but since academics have been studying it for a long time, we can probably safely make that presumption.

Other ancient languages, however, are proving more evasive. The Indus Valley civilization, which is said to be one of the oldest ever discovered, used a language that is defying almost all attempts at translations because it has no established relationship with any other language on Earth, although it is pictorial in part.

The Sumerian language is more amenable to translation because some Sumerian people appear to have been bilingual, also speaking a contemporary language called Akkadian.

Translation work has so far been undertaken by humans, but soon, artificial intelligence systems will, inevitably, be used to not only speed up the process, but also improve accuracy – and perhaps identify similarities and patterns across many languages humans may not have the time or ability to interpret.

Read more: Robotics & Automation

‘Untranslatable’ words tell us more about English speakers than other cultures

When the word “hygge” became popular outside Denmark a few years ago, it seemed the perfect way to express the feeling of wrapping yourself up in a crocheted blanket with a cosy jumper, a cup of tea and back-to-back episodes of The Bridge. But is it really only the Danes, with their cold Scandinavian evenings, who could have come up with a word for such a specific concept? And is it only the Swedes who could have needed the verb “fika” to describe chatting over a coffee?

The internet abounds with words that lack a single-word English equivalent. In order to be really lacking an English equivalent, it must be a single, indivisible unit of meaning, as phrases are infinitely productive and can be created on demand by combining different words. Take, for example, the claim by Adam Jacot de Boinod in I Never Knew There Was A Word For It, that Malay has a word for the gap between the teeth that English lacks: “gigi rongak”. Well, this appears to be a phrase, and it translates literally as the perfectly cromulent English phrase “tooth gap”.

In fact, English even has a single-word technical term for a gap between the teeth: “diastema”. Okay, that’s actually a Greek word, but it’s in use in English, so it’s also an English word. Does that matter?

Where we get our words from tells us something about our history. Take, for instance, Quechua – the language spoken by people indigenous to the Andes and the South American highlands. The Quechuan word for “book” is “liwru”, which comes from the Spanish word “libro”, because Spanish colonisers introduced written forms of language to the people they conquered. In fact, English does now have a word for “hygge” – it’s “hygge”.

Read more: The Conversation

Welcome To The Era Of Big Translation

Despite the increasing ability to reach foreign customers, the lack of quality translation methods is still the most challenging aspect of global expansion. Currently, even using the most advanced software services are an expensive, complicated, and inaccurate process. The result is that too often, businesses sacrifice millions of dollars in profit because marketing to global consumers is too complex to be worthwhile.

That’s why we need an era of “Big Translation,” to leverage our existing technological tools and scale up translation capabilities to a level that actually matches global communication needs. Big Translation, a large-scale translation efforts by people speaking two or more languages, would allow businesses, individuals, and even tweeters to get what they want translated easily and affordably.

The idea is banking on the fact that the world certainly isn’t lacking in translation talent. Nearly half the world speaks two or more languages–that’s 3.65 billion people with the potential to contribute to translation. Presently, a number of innovators are trying to tap into our world’s language talent. Chief among them is the groundbreaking idea that bypasses traditional translation tools and moves to a more accessible mobile app platform.

In just six years (2020), there will be 6.1 billion smartphone users. If even just a fraction of these users were to be bilingual, the sheer human translation power we would be able to tap into through mobile translation platforms would far exceed the computational capacity of any machine translation system. Translation could finally be fast and inexpensive but also guarantee the complete accuracy of human translation. Allowing smartphone users to provide quality translation in real time will help shape the on-demand marketplace–a marketplace that has has already attracted more than $4.8 billion in investment. Companies like Uber, AirBnB, and now translation startups like my company, Stepes, are defining the competitive edge in the sharing economy.

Big Translation would fundamentally change how we can do business internationally. Downstream, it will affect how we access information, receive entertainment, and spend our free time. In other words, Big Translation has the potential to change the way we live our lives. In a world where anyone could get fast and easy translations, what could we accomplish?

Read more: Fast Company

Artificial intelligence goes bilingual—without a dictionary

Automatic language translation has come a long way, thanks to neural networks—computer algorithms that take inspiration from the human brain. But training such networks requires an enormous amount of data: millions of sentence-by-sentence translations to demonstrate how a human would do it. Now, two new papers show that neural networks can learn to translate with no parallel texts—a surprising advance that could make documents in many languages more accessible.

“Imagine that you give one person lots of Chinese books and lots of Arabic books—none of them overlapping—and the person has to learn to translate Chinese to Arabic. That seems impossible, right?” says the first author of one study, Mikel Artetxe, a computer scientist at the University of the Basque Country (UPV) in San Sebastiàn, Spain. “But we show that a computer can do that.”

Most machine learning—in which neural networks and other computer algorithms learn from experience—is “supervised.” A computer makes a guess, receives the right answer, and adjusts its process accordingly. That works well when teaching a computer to translate between, say, English and French, because many documents exist in both languages. It doesn’t work so well for rare languages, or for popular ones without many parallel texts.

The two new papers, both of which have been submitted to next year’s International Conference on Learning Representations but have not been peer reviewed, focus on another method: unsupervised machine learning. To start, each constructs bilingual dictionaries without the aid of a human teacher telling them when their guesses are right. That’s possible because languages have strong similarities in the ways words cluster around one another. The words for table and chair, for example, are frequently used together in all languages. So if a computer maps out these co-occurrences like a giant road atlas with words for cities, the maps for different languages will resemble each other, just with different names. A computer can then figure out the best way to overlay one atlas on another. Voilà! You have a bilingual dictionary.

Read more: Science

Historically, men translated the Odyssey. Here’s what happened when a woman took the job

The Odyssey is about a man. It says so right at the beginning — in Robert Fagles’s 1996 translation, for example, the poem opens with the line, “Sing to me of the man, Muse, the man of twists and turns.”

In the course of the poem, that man plots his return home after fighting the Trojan War, slaughters the suitors vying to marry his wife Penelope, and reestablishes himself as the head of his household.

But the Odyssey is also about other people: Penelope, the nymph Calypso, the witch Circe, the princess Nausicaa; Odysseus’s many shipmates who died before they could make it home; the countless slaves in Odysseus’s house, many of whom are never named.

Emily Wilson, the first woman to translate the Odyssey into English, is as concerned with these surrounding characters as she is with Odysseus himself. Written in plain, contemporary language and released earlier this month to much fanfare, her translation lays bare some of the inequalities between characters that other translations have elided. It offers not just a new version of the poem, but a new way of thinking about it in the context of gender and power relationships today. As Wilson puts it, “the question of who matters is actually central to what the text is about.”

Composed around the 8th century BC, the Odyssey is one of the oldest works of literature typically read by an American audience; for comparison, it’s almost 2,000 years older than Beowulf. While the Iliad tells the story of the Trojan War, the Odyssey picks up after the war is over, when Odysseus, the king of Ithaca, is trying to make his way home.

Both poems are traditionally attributed to the Greek poet Homer, but since they almost certainly originated as oral performances and not written texts, it’s hard to tell whether a single person composed them, or whether they are the result of many different creators and performers refining and contributing to a story over a period of time. (The introduction to Wilson’s translation includes a longer discussion of the question of who “Homer” was.)

Wilson, a professor of classical studies at the University of Pennsylvania, has also translated plays by the ancient Greek playwright Euripides and the Roman philosopher Seneca. Her translation of the Odyssey is one of many in English (though the others have been by men), including versions by Fagles, Robert Fitzgerald, Richmond Lattimore, and more. Translating the long-dead language Homer used — a variant of ancient Greek called Homeric Greek — into contemporary English is no easy task, and translators bring their own skills, opinions, and stylistic sensibilities to the text. The result is that every translation is different, almost a new poem in itself.

Read more: Vox

Found in translation: USC scientists map brain responses to stories in three different languages

New brain research by USC scientists shows that reading stories is a universal experience that may result in people feeling greater empathy for each other, regardless of cultural origins and differences.

And in what appears to be a first for neuroscience, USC researchers have found patterns of brain activation when people find meaning in stories, regardless of their language. Using functional MRI, the scientists mapped brain responses to narratives in three different languages — Americanized English, Farsi and Mandarin Chinese.

The USC study opens up the possibility that exposure to narrative storytelling can have a widespread effect on triggering better self-awareness and empathy for others, regardless of the language or origin of the person being exposed to it.

“Even given these fundamental differences in language, which can be read in a different direction or contain a completely different alphabet altogether, there is something universal about what occurs in the brain at the point when we are processing narratives,” said Morteza Dehghani, the study’s lead author and a researcher at the Brain and Creativity Institute at USC.

Dehghani is also an assistant professor of psychology at the USC Dornsife College of Letters, Arts and Sciences, and an assistant professor of computer science at the USC Viterbi School of Engineering.

The study was published in the journal Human Brain Mapping.

Read more: USC News