Researchers use AI to unlock the secrets of ancient texts

The Abbey Library of St. Gall in Switzerland is home to approximately 160,000 volumes of literary and historical manuscripts dating back to the eighth century—all of which are written by hand, on parchment, in languages rarely spoken in modern times.

To preserve these historical accounts of humanity, such texts, numbering in the millions, have been kept safely stored away in libraries and monasteries all over the world. A significant portion of these collections are available to the general public through digital imagery, but experts say there is an extraordinary amount of material that has never been read—a treasure trove of insight into the world’s history hidden within.

Now, researchers at University of Notre Dame are developing an artificial neural network to read complex ancient handwriting based on human perception to improve capabilities of deep learning transcription.

“We’re dealing with historical documents written in styles that have long fallen out of fashion, going back many centuries, and in languages like Latin, which are rarely ever used anymore,” said Walter Scheirer, the Dennis O. Doughty Collegiate Associate Professor in the Department of Computer Science and Engineering at Notre Dame. “You can get beautiful photos of these materials, but what we’ve set out to do is automate transcription in a way that mimics the perception of the page through the eyes of the expert reader and provides a quick, searchable reading of the text.”

In research published in the Institute of Electrical and Electronics Engineers journal Transactions on Pattern Analysis and Machine Intelligence, Scheirer outlines how his team combined traditional methods of machine learning with visual psychophysics—a method of measuring the connections between physical stimuli and mental phenomena, such as the amount of time it takes for an expert reader to recognize a specific character, gauge the quality of the handwriting or identify the use of certain abbreviations.

Scheirer’s team studied digitized Latin manuscripts that were written by scribes in the Cloister of St. Gall in the ninth century. Readers entered their manual transcriptions into a specially designed software interface. The team then measured reaction times during transcription for an understanding of which words, characters and passages were easy or difficult. Scheirer explained that including that kind of data created a network more consistent with human behavior, reduced errors and provided a more accurate, more realistic reading of the text.

Read more: Tech Xplore

Hidden meanings: Using artificial intelligence to translate ancient texts

The ancient world is full of mystery. Many mysteries, in fact. Many mysteries indeed.

Who built the monolithic and megalithic structures found all over the world? Why did they build them? How did they build them? What technology did they use?

And perhaps most importantly from the point of view answering all the other questions: Where are the texts that the builders produced?

We assume that if the ancients were capable of building structures that modern humans cannot replicate even now with the latest technology, they must have been a literate civilization which recorded and stored information.

But where is it?

These are among the multitude of questions that have actively and specifically preoccupied archaeologists and historians for more than a century.

A huge amount of progress has been made as a result of the dedicated pursuit of the answers. It has spawned a multibillion-dollar global tourism industry and some relatively well-funded academic projects. A lot of museums and films can also be said to be somewhat beholden to this obsession with the ancient past.

But in terms of definitively answering those big questions, progress has been rather slow and painstaking.

The Rosetta Stone

It would, of course, help if more artifacts like the Rosetta Stone were discovered.

The Rosetta Stone, created in around 200 BC and discovered in the year 1800, is a black stone on which three different languages were written – Egyptian hieroglyphics, Greek, and a more common Egyptian language called Demotic.

This stone enabled people studying ancient cultures to finally understand the Egyptian hieroglyphics which cover acres of surface area on pyramids and temples in the country.

The presumption is made that the three statements on the Rosetta Stone are direct and literal translations of each other, but since academics have been studying it for a long time, we can probably safely make that presumption.

Other ancient languages, however, are proving more evasive. The Indus Valley civilization, which is said to be one of the oldest ever discovered, used a language that is defying almost all attempts at translations because it has no established relationship with any other language on Earth, although it is pictorial in part.

The Sumerian language is more amenable to translation because some Sumerian people appear to have been bilingual, also speaking a contemporary language called Akkadian.

Translation work has so far been undertaken by humans, but soon, artificial intelligence systems will, inevitably, be used to not only speed up the process, but also improve accuracy – and perhaps identify similarities and patterns across many languages humans may not have the time or ability to interpret.

Read more: Robotics & Automation