Listen to this story
After Eugenia Kuyda’s closest friend died in a car accident, she decided to build a monument to him. She gathered text messages Roman Mazurenko had sent her and convinced his friends and family to do the same. Eventually, Kuyda, a software developer, gathered more than 8,000 lines of text that captured Mazurenko’s interests, thoughts, and personality. This was the raw material needed to train a neural network to speak like Mazurenko, to respond to messages as if he were writing the words himself.
“Roman bot” was published on Kuyda’s chatbot platform, Luka, in 2016. All a user needed to do was add @Roman, and they would be able to converse with the simulation, learning about Mazurenko’s life and career and, hopefully, glean something of his temperament. The rhythm of speech and the kinds of responses all carefully mimicked Kuyda’s friend. It was an experimental monument, a digital facsimile. Some called it a ghost. In a Facebook post, Kuyda described the experience of chatting to the bot as talking to “a shadow of a person.”
The technology wasn’t perfect, she noted, and a lot of the time @Roman would say something that didn’t make sense, but what her team had done “wasn’t possible just a year ago and in the very close future we will be able to do a lot more.”
If we talk into our phones, could we one day hear the voices of dead loved ones talking back to us?
In 2018, Google unveiled its Duplex system. Billed as an “A.I. system for accomplishing real-world tasks over the phone,” Duplex works by leveraging a recurrent neural network (RNN), along with the company’s automatic speech recognition technology, to convincingly call up businesses on behalf of users.
Most impressive—and for some, unsettling—is Google Duplex’s way with words. The company has laced its A.I. assistant with an array of phrases like “hmm” and “uh” that imitate the pauses and intonations of natural speech. This responsive speaker sounds more human than your average automated call. In fact, when Google first showcased the technology, there were cries that Duplex came across as duplicitous, misleading people into thinking they were talking to a human instead of a machine.
If Kuyda’s “Roman bot” managed a semblance of her deceased friend in text, how could this approach be advanced with the type of technology Google is pursuing with Duplex? The company has emphasized that the system will be transparent about its nature during calls, but there’s nevertheless a feeling that a crucial line has been reached in artificial speech. Could an A.I. system learn to appropriate the vocal rhythms, the tics of personality, for specific individuals? If we talk into our phones, could we one day hear the voices of dead loved ones talking back to us?
The idea of preserving a person through their speech is not new. In an 1878 essay, Thomas Edison proclaimed that his phonograph—the first device to reproduce recorded sound—would “annihilate time and space and bottle up for posterity the mere utterance of man.”
A glance out the window will show you that time and space have yet to be annihilated, but the sentiment of bottling up utterances is a persistent one. From the work of anthropologists, such as John Peabody Harrington, in capturing the speech of the native people of California on wax cylinders, to projects, such as BBC Voices, in archiving the linguistic landscape of countries, we have long been preserving people through records of conversation.
Over the past few years, a collaboration between the University of Southern California’s Institute for Creative Technologies and the Shoah Foundation has pushed this into a new arena. The New Dimensions in Testimony (NDT) project has created around a dozen “interactive biographies” of Holocaust survivors, based on extensive interviews filmed in a custom 360-degree light stage. The testimony of each of these individuals is used to create a digital projection that, thanks to natural language technology, can respond to questions from an audience.
If you were to ask a survivor about whether they believe in God or how they hid from the Nazis, the system will pick up on your question and surface a relevant section of the interview. By editing these snippets together, the intention is to give the impression of a seamless conversation with a witness to history. As the Shoah institute explains: “Years from now, long after the last survivor has left us, Dimensions in Testimony will be able to provide a valuable opportunity to engage with a survivor and ask them questions directly.”
A number of these interactive systems have been featured in museums. The Illinois Holocaust Museum has built a permanent theater to house the testimonies as part of its Take a Stand Center. The USC Shoah Institute has also built a system around the testimony of a survivor of the Nanking Massacre, while the USC Institute for Creative Technologies has used the underlying technology to collaborate with the U.S. Army on a project that allows soldiers to interview a victim of sexual assault. I ask David Traum, one of the project leaders, what he makes of Google Duplex. Is the potential to generate convincing speech something that could work well with an “interactive biography”?
“The recorded approach we use with NDT has the advantage of quality and authenticity,” he says. “The generated speech approach has the advantage of being able to create new content cheaply without additional access to the original person. That can be an important feature if you need some new content; for example, a specific reaction to a question that couldn’t have been conceived of at the time of recording.”
While a system capable of generating responses may make its subject more reactive, Traum warns that being able to construct reactions would mean you could put words in someone else’s mouth: “It may become difficult or impossible to know whether someone actually said something or not. Authenticity is an important issue, so it’s probably not a good idea to blur the line.”
When I speak to Virginia Dignum, associate professor of social artificial intelligence at Delft University of Technology, she echoes Traum’s caution: “I don’t think that an A.I. used to simulate people should be made to come up with any type of answer. The system should be aware of the limits of what it ‘knows.’
It can resemble a person, but is ultimately just an outline.
“If the person who gave the testimony never said anything about, for example, their favorite color, it is not for the system to come up with an answer, such as by extrapolating from the colors of the person’s clothes. It should be clear to the user that the simulation is meant to talk about a specific situation.”
The aims of NDT are decidedly historical, giving audiences an insight into events, such as the Holocaust and the Nanking Massacre, anchored in individual testimony. When it comes to the subjects of the project, the system has a clear idea of what it “knows” about these people. But could a system be built to have a more general understanding of an individual? Of their fears, their desires, their innermost feelings about the world?
“It is conceivable that people may attempt to develop such systems to ‘replace’ deceased loved ones in an emotional interaction,” Dignum says. “I would not necessarily say that such uses are to be forbidden per se, but if developed, these must be subjected to strong ethical, psychological, and social reviews, in ways similar to how medicines are introduced nowadays.”
In 2017, Eugenia Kuyda released Replika, an app that lets users talk to an A.I. bot. Over the course of exchanging hundreds of text messages, the system learns your approach to different subjects, feeding this through a neural network to distill your tone and approach to situations. “Talk to someone who always listens,” the company says, framing Replika as a “safe space to share thoughts and feelings without fear of being judged.”
The underlying approach has much in common with Kuyda’s monument to her deceased friend, except the system is learning right in front of you, piecing together a semblance of your personality from its questions. As Kuyda said about her version of Roman Mazurenko, the result is a shadow. It can resemble a person, but is ultimately just an outline. How would it feel for others who interact with this virtual impression of us after we’re gone?
Perhaps it’s the speaking itself that’s important. In the Japanese town of Otsuchi there is a disconnected telephone in a glass booth. It was built by the garden designer Itaru Sasaki, after the death of his cousin in 2010. When 10 percent of Otsuchi’s population was killed by floods caused by the 2011 tsunami, Sasaki opened the phone to the public. There, on a hill overlooking the sea, they could dial in their loved one’s number and have a conversation that could only ever be one way.
Thousands of people are believed to have made the journey to the glass booth, to the kaze no denwa, or “wind phone.” When it comes to those who have left us, perhaps it is enough to speak to the memory.