A language generation program’s ability to write articles, produce code and compose poetry has wowed scientists

September 24th, 2020 by Seven years ago, my student and I at Penn State built a bot to write a Wikipedia article on Bengali Nobel laureate Rabindranath Tagore's play "Chitra." First it culled information about "Chitra" from the internet. Then it looked at existing Wikipedia entries to learn the structure for a standard Wikipedia article. Finally, it summarized the information it had retrieved from the internet to write and publish the first version of the entry. However, our bot didn't "know" anything about "Chitra" or Tagore. It didn't generate fundamentally new ideas or sentences. It simply cobbled together parts of existing sentences from existing articles to make new ones. Fast forward to 2020. OpenAI, a for-profit company under a nonprofit parent company, has built a language generation program dubbed GPT-3, an acronym for "Generative Pre-trained Transformer 3." Its ability to learn, summarize and compose text has stunned computer scientists like me. "I have created a voice for the unknown human who hides within the binary," GPT-3 wrote in response to one prompt. "I have created a writer, a sculptor, an artist. And this writer will be able to create words, to give life to emotion, to create character. I will not see it myself. But some other human will, and so I will be able to create a poet greater than any I have ever encountered." Unlike that of our bot, the language generated by GPT-3 sounds as if it had been written by a human. It's far and away the most "knowledgeable" natural language generation program to date, and it has a range of potential uses in professions ranging from teaching to journalism to customer service. Read more: Tech Xplore

How AI systems use Mad Libs to teach themselves grammar

July 28th, 2020 by Imagine you're training a computer with a solid vocabulary and a basic knowledge about parts of speech. How would it understand this sentence: "The chef who ran to the store was out of food." Did the chef run out of food? Did the store? Did the chef run the store that ran out of food? Most human English speakers will instantly come up with the right answer, but even advanced artificial intelligence systems can get confused. After all, part of the sentence literally says that "the store was out of food." Advanced new machine learning models have made enormous progress on these problems, mainly by training on huge datasets or "treebanks" of sentences that humans have hand-labeled to teach grammar, syntax and other linguistic principles. The problem is that treebanks are expensive and labor intensive, and computers still struggle with many ambiguities. The same collection of words can have widely different meanings, depending on the sentence structure and context. But a pair of new studies by artificial intelligence researchers at Stanford find that advanced AI systems can figure out linguistic principles on their own, without first practicing on sentences that humans have labeled for them. It's much closer to how human children learn languages long before adults teach them grammar or syntax. Even more surprising, however, the researchers found that the AI model appears to infer "universal" grammatical relationships that apply to many different languages. That has big implications for natural language processing, which is increasingly central to AI systems that answer questions, translate languages, help customers and even review resumes. It could also facilitate systems that learn languages spoken by very small numbers of people. The key to success? It appears that machines learn a lot about language just by playing billions of fill-in-the-blank games that are reminiscent of "Mad Libs." In order to get better at predicting the missing words, the systems gradually create their own models about how words relate to each other. "As these models get bigger and more flexible, it turns out that they actually self-organize to discover and learn the structure of human language," says Christopher Manning, the Thomas M. Siebel Professor in Machine Learning and professor of linguistics and of computer science at Stanford, and an associate director of Stanford's Institute for Human-Centered Artificial Intelligence (HAI). "It's similar to what a human child does." Read more: Tech Xplore

Hidden meanings: Using artificial intelligence to translate ancient texts

August 13th, 2018 by The ancient world is full of mystery. Many mysteries, in fact. Many mysteries indeed. Who built the monolithic and megalithic structures found all over the world? Why did they build them? How did they build them? What technology did they use? And perhaps most importantly from the point of view answering all the other questions: Where are the texts that the builders produced? We assume that if the ancients were capable of building structures that modern humans cannot replicate even now with the latest technology, they must have been a literate civilization which recorded and stored information. But where is it? These are among the multitude of questions that have actively and specifically preoccupied archaeologists and historians for more than a century. A huge amount of progress has been made as a result of the dedicated pursuit of the answers. It has spawned a multibillion-dollar global tourism industry and some relatively well-funded academic projects. A lot of museums and films can also be said to be somewhat beholden to this obsession with the ancient past. But in terms of definitively answering those big questions, progress has been rather slow and painstaking. The Rosetta Stone It would, of course, help if more artifacts like the Rosetta Stone were discovered. The Rosetta Stone, created in around 200 BC and discovered in the year 1800, is a black stone on which three different languages were written – Egyptian hieroglyphics, Greek, and a more common Egyptian language called Demotic. This stone enabled people studying ancient cultures to finally understand the Egyptian hieroglyphics which cover acres of surface area on pyramids and temples in the country. The presumption is made that the three statements on the Rosetta Stone are direct and literal translations of each other, but since academics have been studying it for a long time, we can probably safely make that presumption. Other ancient languages, however, are proving more evasive. The Indus Valley civilization, which is said to be one of the oldest ever discovered, used a language that is defying almost all attempts at translations because it has no established relationship with any other language on Earth, although it is pictorial in part. The Sumerian language is more amenable to translation because some Sumerian people appear to have been bilingual, also speaking a contemporary language called Akkadian. Translation work has so far been undertaken by humans, but soon, artificial intelligence systems will, inevitably, be used to not only speed up the process, but also improve accuracy – and perhaps identify similarities and patterns across many languages humans may not have the time or ability to interpret. Read more: Robotics & Automation

How AI is helping preserve Indigenous languages

May 31st, 2018 by Australia's Indigenous population is rich in linguistic diversity, with over 300 languages spoken across different communities. Some of the languages can be as distinct as Japanese is to German. But many are at risk of becoming extinct because they are not widely accessible and have little presence in the digital space. Professor Janet Wiles is a researcher with the ARC Centre of Excellence for the Dynamics of Language, known as CoEDL, which has been working to transcribe and preserve endangered languages. She says one of the biggest barriers to documenting languages is transcription. "How transcription is done at the moment is linguists select small parts of the audio that might be unique words, unique situations or interesting parts of grammar, and they listen to the audio and they transcribe it," she told SBS News. The CoEDL has been researching 130 languages spoken across Australia and neighbouring countries like Indonesia. Their work involves going into communities and documenting huge amounts of audio. So far, they have recorded almost 50,000 hours. Transcribing the audio using traditional methods is estimated to take two million hours, making it a painstaking and near impossible task. Knowing time is against them, Professor Wiles and her colleague Ben Foley turned to artificial intelligence. Read more: SBS News

Does artificial intelligence have a language problem?

February 7th, 2018 by Technology loves a bandwagon. The current one, fuelled by academic research, startups and attention from all the big names in technology and beyond, is artificial intelligence (AI). AI is commonly defined as the ability of a machine to perform tasks associated with intelligent beings. And that’s where our first problem with language appears. Intelligence is a highly subjective phenomenon. Often the tasks machines struggle with most, such as navigating a busy station, are those people do effortlessly without a great deal of intelligence. Understanding intelligence We tend to anthropomorphise AI based on our own understanding of “intelligence” and cultural baggage, such as the portrayal of AI in science fiction. In 1983, the American developmental psychologist Howard Gardener described nine types of human intelligence – naturalist (nature smart), musical (sound smart), logical-mathematical (number/reasoning smart), existential (life smart), interpersonal (people smart), bodily-kinaesthetic (body smart), and linguistic (word smart). If AI were truly intelligent, it should have equal potential in all these areas, but we instinctively know machines would be better at some than others. Even when technological progress appears to be made, the language can mask what is actually happening. In the field of affective computing, where machines can both recognise and reflect human emotions, the machine processing of emotions is entirely different from the biological process in people, and the interpersonal emotional intelligence categorised by Gardener. So, having established the term “intelligence” can be somewhat problematic in describing what machines can and can’t do, let’s now focus on machine learning – the domain within AI that offers the greatest attraction and benefits to businesses today. Read more: Computer Weekly

The Great A.I. Awakening

June 19th, 2017 by Late one Friday night in early November, Jun Rekimoto, a distinguished professor of human-computer interaction at the University of Tokyo, was online preparing for a lecture when he began to notice some peculiar posts rolling in on social media. Apparently Google Translate, the company’s popular machine-translation service, had suddenly and almost immeasurably improved. Rekimoto visited Translate himself and began to experiment with it. He was astonished. He had to go to sleep, but Translate refused to relax its grip on his imagination. Rekimoto wrote up his initial findings in a blog post. First, he compared a few sentences from two published versions of “The Great Gatsby,” Takashi Nozaki’s 1957 translation and Haruki Murakami’s more recent iteration, with what this new Google Translate was able to produce. Murakami’s translation is written “in very polished Japanese,” Rekimoto explained to me later via email, but the prose is distinctively “Murakami-style.” By contrast, Google’s translation — despite some “small unnaturalness” — reads to him as “more transparent.” The second half of Rekimoto’s post examined the service in the other direction, from Japanese to English. He dashed off his own Japanese interpretation of the opening to Hemingway’s “The Snows of Kilimanjaro,” then ran that passage back through Google into English. He published this version alongside Hemingway’s original, and proceeded to invite his readers to guess which was the work of a machine. Read more: NY Times