World’s largest linguistics database is getting too expensive for some researchers

February 16th, 2020 by It was 2015 when Gary Simons knew that something had to change. That was the year spare funds started to dry up at the Summer Institute of Linguistics (SIL), a Bible translation group that helped revolutionize the documentation of endangered languages in the mid–20th century. SIL’s budget had long supported Simons’s passion project: Ethnologue—or “the Ethnologue” as many researchers call it—a massive online database considered by many to be the definitive source for information on the world’s languages. Ethnologue’s users—and there are hundreds of thousands—can track how many people speak each of the world’s tongues, from Hebrew to Hausa to Hakka (9.3 million, 63.4 million, and 48.2 million, respectively). The database indicates, on a scale of one to 10, every language’s risk of extinction. It also gives a surprisingly clear answer to the squishy question, “how many languages exist?” (7111, by the latest count). For linguists, it’s a resource of reference; for students, it’s a window into the diversity of human language. But for Simons, a computational linguist who has run Ethnologue for almost 20 years, it’s been a growing heartache. To help cover its nearly $1 million in annual operating costs, Ethnologue got its first paywall in late 2015; most nonpaying visitors were turned away after several pages. Since October 2019, the paywall has taken a new form: It lets visitors access every page, but it blots out information on how many speakers a language has and where they live. Subscriptions now start at $480 per person per year. The online backlash has been harsh. Many linguists have vowed to abandon the site for other resources. “In the last few years, [Ethnologue has] gotten increasingly expensive and locked down,” says Simon Greenhill, an evolutionary linguist at the Max Planck Institute for the Science of Human History. “This is a very sad step.” He and other scholars are now struggling to find a cheaper or free substitute for the population figures—the data that long made Ethnologue “the only option,” for researchers studying linguistic diversity, says Greenhill, who studies the relationships between languages. “I’m not fundamentally opposed to paying for data, but it’s a hard pill to swallow,” Greenhill says. For a recent paper on how geography affects language diversity, his team used data from an older version of Ethnologue that they had previously paid for; access to its most current databases would have cost several thousand dollars. He’ll be doing the same for an upcoming paper on the causes of language extinction. Simons understands why linguists are upset. The need to impose fees “is heavy on our heart,” he says. “But we can’t really do anything until we change the economic picture. If we keep coasting the way we are, it’s just going to crumble. Read more: American Association for the Advancement of Science

How the limits of the mind shape human language

September 2nd, 2019 by When we speak, our sentences emerge as a flowing stream of sound. Unless we are really annoyed, We. Don’t. Speak. One. Word. At. A. Time. But this property of speech is not how language itself is organised. Sentences consist of words: discrete units of meaning and linguistic form that we can combine in myriad ways to make sentences. This disconnect between speech and language raises a problem. How do children, at an incredibly young age, learn the discrete units of their languages from the messy sound waves they hear? Over the past few decades, psycholinguists have shown that children are “intuitive statisticians”, able to detect patterns of frequency in sound. The sequence of sounds rktr is much rarer than intr. This means it is more likely that intr could occur inside a word (interesting, for example), while rktr is likely to span two words (dark tree). The patterns that children can be shown to subconsciously detect might help them figure out where one word begins and another ends. One of the intriguing findings of this work is that other species are also able to track how frequent certain sound combinations are, just like human children can. In fact, it turns out that that we’re actually worse at picking out certain patterns of sound than other animals are. Read more: The Conversation

Can language slow down time?

August 6th, 2018 by What if the language you spoke caused you to perceive time differently? Does that sound like magic realism? Close: it’s economics. Some recent research papers published in economics journals – notably a 2013 paper by Keith Chen of Yale and a 2018 paper by three Australian economists – have proposed that languages that grammatically distinguish future from present cause their speakers to plan less, save less, even care less for the environment. That sound you just heard was thousands of linguists rolling their eyes and groaning “Whorf”. Bejamin Lee Whorf was an inspector for a fire insurance company, and he saw that language could cause safety problems. People were careless around empty gasoline drums because they were “empty” – except that, in fact, they were filled with gasoline vapour, which can explode. This spurred him to study and write about language. Whorf spent time with the Hopi people of northeastern Arizona. He observed that they had no grammatical distinctions for future and past and no way to count periods of time. He looked at their cultural practices and concluded that the Hopi see time quite differently from us, and that concepts that seem obvious to us – such as “tomorrow is another day” – had no meaning for them. His publication of these ideas in 1939 set the philosophy of language on fire. From Whorf’s proposals and those of his teacher, a Yale professor named Edward Sapir, came what Whorf called the Linguistic Relativity Hypothesis, commonly known as the Sapir-Whorf hypothesis. Its mildest form is that language can affect how we think; its strongest form is that we can’t think about things our language doesn’t let us talk about. Over time, these explosive ideas – and much of Whorf’s data – were found to be mostly… empty. In 1983, a researcher named Ekkehart Malotki published Hopi Time, a thick volume detailing his research on the Hopi and their language, which proceeded with a long, slow burn to incinerate Whorf’s edifice of data and theory about the Hopi. And with the demise of the strong version of the Sapir-Whorf hypothesis came a mistrust of any ideas of linguistic relativity. Read more: BBC

When Genetics and Linguistics Challenge the Winners’ Version of History

March 29th, 2018 by Two conquering empires and more than 500 years of colonial rule failed to erase the cultural and genetic traces of indigenous Peruvians, a new study finds. This runs contrary to historical accounts that depict a complete devastation of northern Peru’s ancient Chachapoya people by the Inca Empire. The Chachapoyas—sometimes referred to as “Warriors of the Clouds” because they made their home in the Amazonian cloud forests—are mainly known today for what they built: fortified hilltop fortresses and intricate sarcophagi overlooking their villages from sheer, inaccessible cliff sides. The little we know about their existence before the arrival of the Spanish comes to us via an oral history passed along by the Inca to their Spanish conquerors—in other words, the winners’ version of history. Now, a study tracking the genetic and linguistic history of modern Peruvians is revealing that the Chachapoyas may have fared better than these mainstream historical accounts would have us believe. As Chiara Barbieri, a post-doctoral researcher from the Max Planck Institute for the Science of Human History, puts it: “Some of these historical documents were exaggerated and a little bit biased in favor of the Inca.” Many of these early reports stem from two historians who essentially wrote the book on the Inca Empire during the time period from 1438 to 1533: Inca Garcilaso de la Vega, the son of a conquistador and Incan princess who published chronicles on the Inca Empire in the early 17th century, and Pedro de Cieza de Leon, a Spanish conquistador from a family of Jewish converts who travelled through the area in the mid-16th century, and wrote one of the first lengthy histories of the Inca people and Spanish conquests. According to Cieza de Leon’s account, it was in the 1470s, about midway through the Inca Empire, that paramount leader Túpac Inca Yupanqui first attacked the Chachapoyas in what is today northern Peru. He quickly found that the Warriors of the Clouds were not the type to give up without a fight. Cieza de Leon described the first battle between Yupanqui and the Chachapoyas in the first part of his Chronicle of Peru: The Chachapoyas Indians were conquered by them, although they first, in order to defend their liberty, and to live in ease and tranquillity, fought with such fury that the Yncas fled before them. But the power of the Yncas was so great that the Chachapoyas Indians were finally forced to become servants to those Kings, who desired to extend their sway over all people. Beaten but not defeated, the Chachapoyas rebelled again during the reign of Yupanqui’s son after the latter died. Huayna Capac had to re-conquer the region, but encountered many of the difficulties his father had, according to Cieza de Leon: Among the Chachapoyas the Inca met with great resistance; insomuch that he was twice defeated by the defenders of their country and put to flight. Receiving some succour, the Inca again attacked the Chachapoyas, and routed them so completely that they sued for peace, desisting, on their parts, from all acts of war. The Inca granted peace on conditions very favourable to himself, and many of the natives were ordered to go and live in Cuzco, where their descendants still reside. De la Vega’s account, written nearly 50 years after Cieza de Leon’s in the early 17th century, tells a similar story of a decisive conquest and subsequent forced dispersal of the Chachapoyas around the Inca Empire. The Inca often used this strategy of forced dispersal, which they referred to by the Quechua word mitma, to dissuade future rebellion in the vast region they came to control. (Quechua, according to the new study, is the most widely-spoken language family of the indigenous Americas.) “We have some records in the Spanish history that the Inca had replaced the population completely, moving the Chachapoyas for hundreds of kilometers and replacing them with people from other parts of the empire,” Barbieri says. These and other accounts are some of the only historical notes we have of the Inca, who lacked any system of writing other than the quipu, or knot records. The quipu system of cords used different types of knots to indicate numbers, and was used for accounting and other records. “We know a lot about what the Inca did because Inca kings, or high officials, were talking to Spanish historians,” Barbieri says. “So the piece of history of this region that we know is very much biased towards what the Inca elite were telling the Spaniards. What we don’t know was what happened before that—everything that happened before the 16th century.” That is now changing, thanks to a genetic study on which Barbieri was lead author, published recently in Scientific Reports. Read more: Smithsonian

Linguistic time capsule in South America sheds light on human migration

March 26th, 2018 by Tiny Suriname, the smallest country in South America, punches far above its weight in linguistic diversity. Many people speak Dutch, but if you visit, you're also likely to hear Hindi, Javanese, a variety of indigenous languages, Portuguese, Cantonese, and possibly others. This real-world Babel, in a country of fewer than 600,000 people, is a relic of Suriname’s colonial history. The language that enables everyone to communicate is Sranan. It's a creole that serves as a linguistic time capsule, capturing Suriname’s brief tenure as a British colony before the territory was ceded to the Dutch in 1667. This time capsule status has allowed a group of researchers to use Sranan to reconstruct details about migration to the colony from England in the 1600s. Their results show how cultural artifacts could be used to trace human migration—and might one day help researchers trace the origins of enslaved people. A living linguistic fossil Creole languages arise in relatively extreme situations, when different groups of people find themselves in prolonged contact without a shared language—like in a young colony. People use bits of different languages to try to communicate, and over generations, these halting “pidgin” languages become fully fledged natural human languages: creoles. Like many famous creoles, including its close relative Gullah, Sranan is English-based, meaning that the bulk of its vocabulary comes from English. It also has words that can be traced to Dutch and Portuguese and a tiny percentage that can be traced to African languages. Both English and Sranan have changed markedly since the 17th century. But in one important way, Sranan is a “linguistic fossil,” said Nicole Creanza, one of the researchers involved in the Sranan study. In a phone call with Ars, she explained there was a “pulse of English influence” before the Dutch took over and most of the English speakers left. As a result, the English that influenced Sranan captures a very brief point in linguistic time. Read more: Ars Technica

Carrying On His Great Grandfather’s Work, A Kansas Professor Helps Keep Their Language Alive

October 22nd, 2017 by As a kid, Andrew McKenzie had an unusual affinity for languages. He took French in high school (because everyone else was taking Spanish). But that wasn't enough. "I started to teach myself different languages, like Latin and Greek and Basque and Turkish," he remembers. "I would drive into the city to a bookstore, and they’d have a section with language books. I'd say, 'I'm just going to learn this language because the book has the prettiest font.'" So it's not surprising that McKenzie ended up as a professor of linguistics at the University of Kansas. But it turns out there's another reason why he's uniquely qualified for his area of research, which involves documenting the endangered language of Oklahoma's Kiowa people. A languages dies when children stop learning it naturally (as opposed to being taught at school) and when there's no documentation. But if it's been documented, a language can be revived (the best example of this is Hebrew). The Kiowa tribe is small, with only about 12,000 members, many of them spread out around the country. Most of the native speakers are in Southwest Oklahoma. “There are only a few dozen speakers, and some people would even estimate fewer," McKenzie says. "And a lot of them are in their 80s and 90s.” By one estimate, Kiowa is among 165 endangered languages in the United States; thousands of languages around the world are also in danger of extinction. Read more: KCUR 89.3

Evolutionary biology can help us understand how language works

October 10th, 2017 by As a linguist I dread the question, “what do you do?”, because when I answer “I’m a linguist” the inevitable follow-up question is: “How many languages do you speak?” That, of course, is not the point. While learning languages is a wonderful thing to do, academic linguistics is the scientific study of language. What I do in my work is to try to understand how and why languages are the way they are. Why are there so many in some places and so few in others? How did languages develop so many different ways of fulfilling the same kinds of communicative tasks? What is uniquely human about language, and how do the human mind and language shape each other? This is something of a new direction in linguistics. The old-school study of language history was more concerned with language for its own sake: understanding the structure of languages and reconstructing their genealogical relationships. One of the exciting things happening in linguistics today is that linguists are increasingly connecting with the field of evolutionary biology. Evolutionary biologists ask very similar questions about species to those me and my colleagues want to ask about languages: why they are distributed in a certain way, for example, or looking for explanations for differences and similarities between them. These similarities in outlook allow us to apply all the modern tools of computational evolutionary biology to linguistic questions, giving us new insights into fundamental questions about the processes of language change, and through that into the nature of language in general. Read more: The Conversation

When Did Colonial America Gain Linguistic Independence?

July 5th, 2017 by When did Americans start sounding funny to English ears? By the time the Declaration of Independence was signed in 1776, carefully composed in the richly-worded language of the day, did colonial Americans—who after all were British before they decided to switch to become American—really sound all that different from their counterparts in the mother country? If you believe historical reenactments in film and television, no. Many people assume colonists spoke with the same accents their families immigrated with, which were largely British ones. Of course, sociolinguistic studies regularly show that speakers of American English seem to have a gentle inferiority complex about their own different accents, often rating British accents as higher in social status, for instance. So anglophone language attitudes being what they are, the accents of historical figures often end up British-inflected anyway, which, for audiences on both sides of the pond, seems to add an air of artistic verisimilitude to what might otherwise be a bald and unconvincing narrative. This might ultimately be a stretch for Romans and Nazis and evil villains. But is it really out of left field for the principal historical figures of colonial British America, on-screen or off-, to have sounded more or less British, with its tumbling mess of quirky regional dialects, a Scot here, a Cockney there, as well as the ever present Queen’s English? Read more: JSTOR

After Years Of Restraint, A Linguist Says ‘Yes!’ To The Exclamation Point

June 20th, 2017 by The only literary work about punctuation I'm aware of is an odd early story by Anton Chekhov called "The Exclamation Mark." After getting into an argument with a colleague about punctuation, a school inspector named Yefim Perekladin asks his wife what an exclamation point is for. She tells him it signifies delight, indignation, joy and rage. He realizes that in 40 years of writing official reports, he has never had the need to express any of those emotions. As Perekladin obsesses about the mark, it becomes an apparition that haunts his waking life, mocking him as an unfeeling machine. In desperation, he signs his name in a visitors book and puts three exclamation points after it. All of a sudden, Chekhov writes, "He felt delight and indignation, he was joyful and seethed with rage." Yefim Perekladin, c'est moi! At least, I used to be one of those people who use the exclamation point as sparingly as possible. We'll grudgingly stick one in after an interjection or a sentence like "What a jerk!" but never to punch up an ordinary sentence in an essay or email. We say we're saving them for special occasions, but they never seem to arise. Read more: NPR

Elon Musk and linguists say that AI is forcing us to confront the limits of human language

June 14th, 2017 by In analytic philosophy, any meaning can be expressed in language. In his book Expression and Meaning (1979), UC Berkeley philosopher John Searle calls this idea “the principle of expressibility, the principle that whatever can be meant can be said”. Moreover, in the Tractatus Logico-Philosophicus (1921), Ludwig Wittgenstein suggests that “the limits of my language mean the limits of my world”. Outside the hermetically sealed field of analytic philosophy, the limits of natural language when it comes to meaning-making have long been recognized in both the arts and sciences. Psychology and linguistics acknowledge that language is not a perfect medium. It is generally accepted that much of our thought is non-verbal, and at least some of it might be inexpressible in language. Notably, language often cannot express the concrete experiences engendered by contemporary art and fails to formulate the kind of abstract thought characteristic of much modern science. Language is not a flawless vehicle for conveying thought and feelings. In the field of artificial intelligence, technology can be incomprehensible even to experts. In the essay “Is Artificial Intelligence Permanently Inscrutable?” Princeton neuroscientist Aaron Bornstein discusses this problem with regard to artificial neural networks (computational models): “Nobody knows quite how they work. And that means no one can predict when they might fail.” This could harm people if, for example, doctors relied on this technology to assess whether patients might develop complications. Bornstein says organizations sometimes choose less efficient but more transparent tools for data analysis and “even governments are starting to show concern about the increasing influence of inscrutable neural-network oracles.” He suggests that “the requirement for interpretability can be seen as another set of constraints, preventing a model from a ‘pure’ solution that pays attention only to the input and output data it is given, and potentially reducing accuracy.” The mind is a limitation for artificial intelligence: “Interpretability could keep such models from reaching their full potential.” Since the work of such technology cannot be fully understood, it is virtually impossible to explain in language. Read more: Quartz

Language alters our experience of time

June 14th, 2017 by It turns out, Hollywood got it half right. In the film Arrival, Amy Adams plays linguist Louise Banks who is trying to decipher an alien language. She discovers the way the aliens talk about time gives them the power to see into the future – so as Banks learns their language, she also begins to see through time. As one character in the movie says: “Learning a foreign language rewires your brain.” My new study – which I worked on with linguist Emanuel Bylund – shows that bilinguals do indeed think about time differently, depending on the language context in which they are estimating the duration of events. But unlike Hollywood, bilinguals sadly can’t see into the future. However, this study does show that learning a new way to talk about time really does rewire the brain. Our findings are the first psycho-physical evidence of cognitive flexibility in bilinguals. We have known for some time that bilinguals go back and forth between their languages rapidly and often unconsciously – a phenomenon called code-switching. But different languages also embody different worldviews and different ways of organising the world around us. The way that bilinguals handle these different ways of thinking has long been a mystery to language researchers. Read more: The Conversation

Understanding Languages with Physics and Math

May 20th, 2017 by A husband and wife scientist duo from Poland has developed a computer model that simulates how vocabulary exchanges occur between settlers and nomads. According to their results, published in the journal Physical Review E, the nomadic groups are more likely to adopt words from settlers than the other way around. The new model provides a tool that can help sociolinguists understand how migration and intercultural interactions can influence the evolution of a language. While linguists tend to express careful skepticism about the significance of this and similar predictive computational models, these studies contribute to the growing interest of studying social behaviors with computational methods. "For this study, we developed a model using a method that has been around for 20 years or so," said Adam Lipowski, a physicist from Adam Mickiewicz University in Poznań, Poland. He is referring to the Naming Game, which simulates the exchange of information between individuals during face-to-face interactions. The model divides individuals into two groups, each with their own language. During the simulation, the group that represents the settlers stays put, while individuals from the nomadic group move around. Over time, the model shows that the nomads are more likely to pick up new words than the settlers, even when everything else, such as the vocabulary size or the population size of the groups, were kept equal. This result came as a bit of a surprise to Lipowski and his co-author, Dorota Lipowska. Read more: Inside Science