Welsh Wikipedia Gives Me Hope

August 29th, 2019 by If you say, “Alexa, faint o’r gloch yw hi?” the smart speaker will not understand that you are asking for the time of day. That’s because Welsh is not one of the eight languages currently supported by Amazon’s Alexa-enabled devices. Gareth Morlais, a Welsh language and digital media specialist for the Welsh government, has argued for years that this language gap is disturbing. In a 2017 presentation, Morlais noted that the Welsh language, then ranked 172nd in the world by number of speakers, was not supported by Alexa, Twitter, or Google’s search interface. At the time, Alexa only spoke and understood two languages: English and German. “The technology actually tells you which language your family can speak at home, which is a horror story,” Morlais said. “What we need to do here is try to shape the technology so that it speaks the same language that we want to speak.” Although Alexa still does not speak or understand Welsh, the Celtic language’s presence in tech has increased dramatically within a short period. Google announced in February that it had expanded its offerings in Docs, Sheets, Slides, and Drive to include Welsh. And Google Translate—infamous since 2009 for its Scymraeg, or scummy Welsh—has, according to the BBC, recently taken a great leap forward in terms of the accuracy and quality of its Welsh translations. Morlais and others attribute this in part to the fact that there are now more than 100,000 articles on the Welsh version of Wikipedia, known as Wicipedia. Like other language editions, Wicipedia is a separate website with its own content, not simply a translation of English Wikipedia, a distinction that matters for both users and big tech companies. Back in 2017, Morlais observed, “There appears to be an indication that there is a link between the languages with the most Wikipedia articles or pages and the languages that are supported by the digital giants.” Google Translate and other technologies use artificial neural networks to learn from example, training themselves with language data from rich internet sources like Welsh Wikipedia. The Welsh community is not alone in using wiki-technology to promote its language. This year’s Celtic Knot conference in Cornwall, England, included several indigenous languages with their own Wikipedia editions. The original idea, as the name suggests, was to focus on Celtic languages, including Irish, Breton, Scottish Gaelic, Welsh, and Cornish (which was declared extinct merely a decade ago), as well as Scots.* But as word got out about a Wikipedia minority language conference, others began to join, representing, for example, the Sámi language spoken in parts of Norway, Finland, Sweden, and Russia; the Berber family of languages spoken in Northern Africa; and the Basque and Catalan communities. (In his 2017 presentation, Morlais noted that Catalan was one of the few minority languages supported by Google search, an accomplishment he linked to the fact that Catalan already had more than 500,000 articles on its language edition of Wikipedia.) Read more: Slate

The Prospects for the Sum of All Human Knowledge in Wikipedia in Indigenous Languages

September 7th, 2017 by Historically, indigenous languages have not only been minimized but also marginalized, and this spans every domain, including cyberspace. This has limited the possibility of appreciating other communities, worldviews and traditions. Every indigenous language in the world is undergoing linguistic displacement and, consequently, disappearing. Although some remain active, their use is reserved for private spaces, forcing them to gradually give up their public domain and falling into disuse. Wikipedia, like other new, non-commercial information technologies, can be used to open new public spaces for these languages, and gradually recover the ground lost to more dominant languages. No one today can legitimately doubt that Wikipedia is the largest collaborative encyclopedia ever created and hosted in cyberspace. Its commitment: “Imagine a world in which every single person on the planet is given free access to the sum of all human knowledge” is certainly ideal, and is demonstrated by the 37 million open-access articles (and in Creative Commons) written in 289 languages. However, the representation of indigenous languages on the platform is very low, even though in Latin America there are 522 indigenous communities who speak 420 different languages – of the 6,000 that exist worldwide – and make up 10% of the total regional population. To date, only four official indigenous-language versions are represented: Quechua (19,900 articles), followed by Náhuatl (9,940 articles), Aymara (3,830 articles) and Guaraní (3,128 articles); and 29 more projects are in the Wikipedia Incubator. In October 2016, Rising Voices, with the support of the Wikimedia Foundation, developed the study “Best Practices for Creating Free Knowledge in Indigenous Languages on Wikipedia” in order to effectuate a change and see a greater number of languages, their respective cultures and worldviews represented on Wikipedia. The general purpose of the study is to document and evaluate the status of indigenous languages on Wikipedia as a way of identifying the current capacity and the difficulties in creating and sustaining participation, particularly of native speakers. Read more: Global Voices

Wikipedia ‘facts’ depend on which language you read them in

December 14th, 2016 by Like Facebook and Twitter, Wikipedia could have its own filter bubbles. A new website lets you uncover geographical biases in Wikipedia articles by tracking down where editors of different languages source their information. Insert the URL of any Wikipedia page into Wikiwhere and the site’s algorithm trawls the web to find out where the references cited in the entry originate from. Martin Körner at the University of Koblenz-Landau, Germany, and his colleagues made the tool to compare how Wikipedia articles about the same topic but in a different language might be influenced by different sources. In the English language version of an article on Russia’s annexation of Crimea, for example, they found that 24 per cent of linked references came from Ukrainian new sources while nearly 20 per cent came from Russian sources. In the German version of the same article, however, the balance tipped, with Russian sources making up ten per cent of the total citations and Ukrainian sources only representing three per cent. Read more: New Scientist