World’s largest linguistics database is getting too expensive for some researchers

It was 2015 when Gary Simons knew that something had to change. That was the year spare funds started to dry up at the Summer Institute of Linguistics (SIL), a Bible translation group that helped revolutionize the documentation of endangered languages in the mid–20th century. SIL’s budget had long supported Simons’s passion project: Ethnologue—or “the Ethnologue” as many researchers call it—a massive online database considered by many to be the definitive source for information on the world’s languages.

Ethnologue’s users—and there are hundreds of thousands—can track how many people speak each of the world’s tongues, from Hebrew to Hausa to Hakka (9.3 million, 63.4 million, and 48.2 million, respectively). The database indicates, on a scale of one to 10, every language’s risk of extinction. It also gives a surprisingly clear answer to the squishy question, “how many languages exist?” (7111, by the latest count). For linguists, it’s a resource of reference; for students, it’s a window into the diversity of human language.

But for Simons, a computational linguist who has run Ethnologue for almost 20 years, it’s been a growing heartache. To help cover its nearly $1 million in annual operating costs, Ethnologue got its first paywall in late 2015; most nonpaying visitors were turned away after several pages. Since October 2019, the paywall has taken a new form: It lets visitors access every page, but it blots out information on how many speakers a language has and where they live. Subscriptions now start at $480 per person per year.

The online backlash has been harsh. Many linguists have vowed to abandon the site for other resources. “In the last few years, [Ethnologue has] gotten increasingly expensive and locked down,” says Simon Greenhill, an evolutionary linguist at the Max Planck Institute for the Science of Human History. “This is a very sad step.”

He and other scholars are now struggling to find a cheaper or free substitute for the population figures—the data that long made Ethnologue “the only option,” for researchers studying linguistic diversity, says Greenhill, who studies the relationships between languages. “I’m not fundamentally opposed to paying for data, but it’s a hard pill to swallow,” Greenhill says. For a recent paper on how geography affects language diversity, his team used data from an older version of Ethnologue that they had previously paid for; access to its most current databases would have cost several thousand dollars. He’ll be doing the same for an upcoming paper on the causes of language extinction.

Simons understands why linguists are upset. The need to impose fees “is heavy on our heart,” he says. “But we can’t really do anything until we change the economic picture. If we keep coasting the way we are, it’s just going to crumble.

Read more: American Association for the Advancement of Science

Leave a Reply

Your email address will not be published. Required fields are marked *

one × 2 =