Machine learning has been used to automatically translate long-lost languages

In 1886, the British archaeologist Arthur Evans came across an ancient stone bearing a curious set of inscriptions in an unknown language. The stone came from the Mediterranean island of Crete, and Evans immediately traveled there to hunt for more evidence. He quickly found numerous stones and tablets bearing similar scripts and dated them from around 1400 BCE.

That made the inscription one of the earliest forms of writing ever discovered. Evans argued that its linear form was clearly derived from rudely scratched line pictures belonging to the infancy of art, thereby establishing its importance in the history of linguistics.

He and others later determined that the stones and tablets were written in two different scripts. The oldest, called Linear A, dates from between 1800 and 1400 BCE, when the island was dominated by the Bronze Age Minoan civilization.

The other script, Linear B, is more recent, appearing only after 1400 BCE, when the island was conquered by Mycenaeans from the Greek mainland.

Evans and others tried for many years to decipher the ancient scripts, but the lost languages resisted all attempts. The problem remained unsolved until 1953, when an amateur linguist named Michael Ventris cracked the code for Linear B.

His solution was built on two decisive breakthroughs. First, Ventris conjectured that many of the repeated words in the Linear B vocabulary were names of places on the island of Crete. That turned out to be correct.

His second breakthrough was to assume that the writing recorded an early form of ancient Greek. That insight immediately allowed him to decipher the rest of the language. In the process, Ventris showed that ancient Greek first appeared in written form many centuries earlier than previously thought.

Ventris’s work was a huge achievement. But the more ancient script, Linear A, has remained one of the great outstanding problems in linguistics to this day.

It’s not hard to imagine that recent advances in machine translation might help. In just a few years, the study of linguistics has been revolutionized by the availability of huge annotated databases, and techniques for getting machines to learn from them. Consequently, machine translation from one language to another has become routine. And although it isn’t perfect, these methods have provided an entirely new way to think about language.

Read more: MIT Technology Review

The Great A.I. Awakening

Late one Friday night in early November, Jun Rekimoto, a distinguished professor of human-computer interaction at the University of Tokyo, was online preparing for a lecture when he began to notice some peculiar posts rolling in on social media. Apparently Google Translate, the company’s popular machine-translation service, had suddenly and almost immeasurably improved. Rekimoto visited Translate himself and began to experiment with it. He was astonished. He had to go to sleep, but Translate refused to relax its grip on his imagination.

Rekimoto wrote up his initial findings in a blog post. First, he compared a few sentences from two published versions of “The Great Gatsby,” Takashi Nozaki’s 1957 translation and Haruki Murakami’s more recent iteration, with what this new Google Translate was able to produce. Murakami’s translation is written “in very polished Japanese,” Rekimoto explained to me later via email, but the prose is distinctively “Murakami-style.” By contrast, Google’s translation — despite some “small unnaturalness” — reads to him as “more transparent.”

The second half of Rekimoto’s post examined the service in the other direction, from Japanese to English. He dashed off his own Japanese interpretation of the opening to Hemingway’s “The Snows of Kilimanjaro,” then ran that passage back through Google into English. He published this version alongside Hemingway’s original, and proceeded to invite his readers to guess which was the work of a machine.

Read more: NY Times

Has Google made the first step toward general AI?

Artificial Intelligence (AI) has long been a theme of Sci-fi blockbusters, but as technology develops in 2017, the stuff of fiction is fast becoming a reality. As technology has made leaps and bounds in our lives, the presence of AI is something we are adapting to and incorporating in our everyday existence. A brief history of the different types of AI helps us to understand how we got where we are today, and more importantly, where we are headed.

A Brief History of AI

Narrow AI – Since the 1950’s, specific technologies have been used to carry out rule-based tasks as well as, or better than, people. A good example of this is the Manchester Electronic Computer for playing chess or the automated voice you speak with when you call your bank.

Machine Learning – Algorithms which use large amounts of data to ‘train’ machines to properly identify and separate appropriate data into subsets that can be used to make predictions has been in use since the 1990s. The large amounts of data are basically allowing programming machines to learn rather than follow defined rules. Apple’s digital assistant, Siri, is one example of this. Machine translations for processes like web page translation is aso a common tool

Read more: The London Economic

Linguistics Breakthrough Heralds Machine Translation for Thousands of Rare Languages

The best guess is that humans currently speak about 6,900 different languages. More than half the global population communicates using just a handful of them—Chinese, English, Hindi, Spanish, and Russian. Indeed, 95 percent of people communicate using just 100 languages.

The other argots are much less common. Indeed, linguists estimate that about a third of the world’s languages are spoken by fewer than 1,000 people and are in danger of dying out in the next 100 years or so. With them will go the unique cultural heritage that they embody—stories, phrases, jokes, herbal remedies, and even unique emotions.

It’s easy to think that machine learning can help. The problem is that machine translation relies on huge annotated data sets to ply its trade. These data sets consist of vast corpora of books, articles, and websites that have been manually translated into other languages. This acts like a Rosetta Stone for machine-learning algorithms, and the bigger the data set, the better they learn.

But these huge data sets simply do not exist for most languages. That’s why machine translation works only for a tiny fraction of the most common lingos. Google Translate, for example, only speaks about 90 languages.

So an important challenge for linguists is to find a way to automatically analyze less common languages to better understand them.

Read more: MIT Technology Review

The Perils of Machine Translation

Years ago, on a flight from Amsterdam to Boston, two American nuns seated to my right listened to a voluble young Dutchman who was out to discover the US. He asked the nuns where they were from. Alas, Framingham, Massachusetts was not on his itinerary, but, he noted, he had ‘shitloads of time and would be visiting shitloads of other places’.

The jovial young Dutchman had apparently gathered that ‘shitloads’ was a colourful synonym for the bland ‘lots’. He had mastered the syntax of English and a rather extensive vocabulary but lacked the experience of the appropriateness of words to social contexts.

This memory sprang to mind with the recent news that the Google Translate engine would move from a phrase-based system to a neural network. (The technical differences are described here.) Both methods rely on training the machine with a ‘corpus’ consisting of sentence pairs: an original and a translation. The computer then generates rules for inferring, based on the sequence of words in the original text, the most likely sequence of words from the target language.

The procedure is an exercise in pattern matching. Similar pattern-matching algorithms are used to interpret the syllables you utter when you ask your smartphone to ‘navigate to Brookline’ or when a photo app tags your friend’s face. The machine doesn’t ‘understand’ faces or destinations; it reduces them to vectors of numbers and processes them.

Read more: The Wire

Google Translate AI invents its own language to translate with

Google Translate is getting brainier. The online translation tool recently started using a neural network to translate between some of its most popular languages – and the system is now so clever it can do this for language pairs on which it has not been explicitly trained. To do this, it seems to have created its own artificial language.

Traditional machine-translation systems break sentences into words and phrases, and translate each individually. In September, Google Translate unveiled a new system that uses a neural network to work on entire sentences at once, giving it more context to figure out the best translation. This system is now in action for eight of the most common language pairs on which Google Translate works.

Although neural machine-translation systems are fast becoming popular, most only work on a single pair of languages, so different systems are needed to translate between others. With a little tinkering, however, Google has extended its system so that it can handle multiple pairs – and it can translate between two languages when it hasn’t been directly trained to do so.

For example, if the neural network has been taught to translate between English and Japanese, and English and Korean, it can also translate between Japanese and Korean without first going through English. This capability may enable Google to quickly scale the system to translate between a large number of languages.

“This is a big advance,” says Kyunghyun Cho at New York University. His team and another group at Karlsruhe Institute of Technology in Germany have independently published similar studies working towards neural translation systems that can handle multiple language combinations.

Read more: New Scientist

Machines may never master the distinctly human elements of language

Artificial intelligence is difficult to develop because real intelligence is mysterious. This mystery manifests in language, or “the dress of thought” as the writer Samuel Johnson put it, and language remains a major challenge to the development of artificial intelligence.

“There’s no way you can have an AI system that’s humanlike that doesn’t have language at the heart of it,” Josh Tenenbaum, a professor of cognitive science and computation at MIT told Technology Review in August.

In September, Google announced that its Neural Machine Translation (GNMT) system can now “in some cases” produce translations that are “nearly indistinguishable” from those of humans. Still, it noted:

“Machine translation is by no means solved. GNMT can still make significant errors that a human translator would never make, like dropping words and mistranslating proper names or rare terms, and translating sentences in isolation rather than considering the context of the paragraph or page.”

In other words, the machine doesn’t entirely get how words work yet.

Read more: Quartz

FirstVoices app translates English to Indigenous languages

If you’ve always wanted to text in Cree, Anishinabemowin or Maori, there’s an app for that.

FirstVoices was created by First Peoples’ Cultural Council in British Columbia and has over 100 Indigenous languages including those from Canada, the U.S., Australia and New Zealand.

Once you choose a language like Blackfoot, Dene or Wendat, the app will customize your keyboard for the special characters required so you can text, send Facebook messages and even tweet.

Trish Rosborough is an assistant professor of Indigenous education specializing in language revitalization at the University of Victoria.

A grandmother of nine, Rosborough has been using it to communicate in her mother’s tongue — Kwak’wala.

“That evening when the app had come out, somebody from my home community was texting me late into the night, well late for us grandmothers,” she said. “[lt was] almost midnight and she’s saying, ‘I really need to go to bed but I want to text in our language.'”

Read more: CBC/Radio-Canada

Why Google is investing in global translation

The term “language barrier” may soon be outdated as new, powerful translation tools, from apps to widgets to websites, hit the market.

On Wednesday, Google announced its latest translation innovation in a blog post. Google Translate has introduced 13 new languages to its portfolio. The translation system can now translate 103 languages and covers 99 percent of the online population, according to the tech giant’s own estimates.

The news of Google’s language expansion came a little over a month after Skype, owned by rival tech company Microsoft, rolled out real-time text translation over video chat and text conversations with Skype Translator.

With the race to be the preeminent translation tool growing more competitive, what’s at stake and why are tech companies so interested?

Read more: Christian Science Monitor

Man versus machine: who is winning the race in translation?

Everyone who’s used a machine translation app like Google Translate, Babylon, Jibbigo or iLingual will have experienced the thrill: the first time you copy-paste text in a previously unfathomable language and it translates it, instantly.

It’s intoxicating, and a little bit ‘Brave New World’, but along with that futuristic thrill comes a harsh reality. A machine translation app may be able to give you the gist of a piece of foreign language text – or even a very clear, literal translation – but it can’t compete with the delicacy and local knowledge of content translated by a human.

That being said, machine translation engines are becoming more and more sophisticated all the time, allowing for high-speed multilingual online communication. For businesses, this is a massive opportunity for growth, considering that more than 40% of internet users are not English speakers, according to 2013 statistics by Internet World Stats.

Read more: Information Age

Learning the lingo: Here’s how Google Translate copes with even the rarest languages

Google Translate usually gathers its linguistic intelligence automatically from across the internet, where the world’s most dominant languages have the most representation.

But to master translation involving dialects and relatively less widely used languages, Google needs input from users and native speakers. Without this community input, Google Translate won’t be able to accommodate lesser-used languages.

As part of that process, late last month residents of Friesland, a northern province of the Netherlands, carried out an effort to improve Google Translate’s ability to handle the local language, West Frisian.

The Friese community contributed over 200,000 translations through Google’s Translate Community tool.

Read more: ZDNet

The Bible Is Linguists’ Secret Weapon For Machine-Translating Obscure Languages

Services like Google Translate and Bing work great with English and Spanish, because they have plentiful and deep data sets to draw upon: lots of stuff exists in both of those languages. But the trouble with big data is that it needs big data. This leaves languages like Galician, Welsh, and Faroese in the cold, translationally wise, because there’s just not much of them online to work with.

So linguists from the University of Copenhagen found a different solution for the translation of these minority languages. And that solution is the Bible. They didn’t just pray for better algorithms, though. “The Bible has been translated into more than 1,500 languages, even the smallest and most ‘exotic’ ones, ” says Anders Søgaard a professor at the University of Copenhagen. “The translations are extremely conservative; the verses have a completely uniform structure across the many different languages which means that we can make suitable computer models of even very small languages where we only have a couple of hundred pages of biblical text.”

Read more: Fast Company