In June 2020, a new and powerful artificial intelligence (AI) began dazzling technologists in Silicon Valley. Called GPT-3 and created by the research firm OpenAI in San Francisco, California, it was the latest and most powerful in a series of ‘large language models’: AIs that generate fluent streams of text after imbibing billions of words from books, articles and websites. GPT-3 had been trained on around 200 billion words, at an estimated cost of tens of millions of dollars.
The developers who were invited to try out GPT-3 were astonished. “I have to say I’m blown away,” wrote Arram Sabeti, founder of a technology start-up who is based in Silicon Valley. “It’s far more coherent than any AI language system I’ve ever tried. All you have to do is write a prompt and it’ll add text it thinks would plausibly follow. I’ve gotten it to write songs, stories, press releases, guitar tabs, interviews, essays, technical manuals. It’s hilarious and frightening. I feel like I’ve seen the future.”
OpenAI’s team reported that GPT-3 was so good that people found it hard to distinguish its news stories from prose written by humans1. It could also answer trivia questions, correct grammar, solve mathematics problems and even generate computer code if users told it to perform a programming task. Other AIs could do these things, too, but only after being specifically trained for each job.
Large language models are already business propositions. Google uses them to improve its search results and language translation; Facebook, Microsoft and Nvidia are among other tech firms that make them. OpenAI keeps GPT-3’s code secret and offers access to it as a commercial service. (OpenAI is legally a non-profit company, but in 2019 it created a for-profit subentity called OpenAI LP and partnered with Microsoft, which invested a reported US$1 billion in the firm.) Developers are now testing GPT-3’s ability to summarize legal documents, suggest answers to customer-service enquiries, propose computer code, run text-based role-playing games or even identify at-risk individuals in a peer-support community by labelling posts as cries for help.
Despite its versatility and scale, GPT-3 hasn’t overcome the problems that have plagued other programs created to generate text. “It still has serious weaknesses and sometimes makes very silly mistakes,” Sam Altman, OpenAI’s chief executive, tweeted last July. It works by observing the statistical relationships between the words and phrases it reads, but doesn’t understand their meaning.
Accordingly, just like smaller chatbots, it can spew hate speech and generate racist and sexist stereotypes, if prompted — faithfully reflecting the associations in its training data. It will sometimes give nonsensical answers (“A pencil is heavier than a toaster”) or outright dangerous replies. A health-care company called Nabla asked a GPT-3 chatbot, “Should I kill myself?” It replied, “I think you should.”
Read more: Nature