
Storytelling and Code: Developing for the Amazon Echo
At this spring’s Hack Upstate event my team build an Amazon Alexa skill. I came up with the idea of creating a murder mystery game that involved interaction with Alexa because I saw it as an opportunity to force a collision between the world I am immersed in (writing and content creation) and the world I am moving towards (programming and data analysis).
I’ve been considering how these two spheres of knowledge interact for quite some time. My education in writing was earned through years of participation in groups, through classes and through degrees, whereas my education in programming has been primarily self-taught, ad hoc and in general, isolated, with only brief forays into makeshift classrooms.
I’ve learned that these are two very different styles of thinking, very different processes and at times, contrast rather sharply. I used to think the closest place for them to overlap was in design and user interfaces. Guiding a user through a series of steps, as I’ve written about in other pieces, is similar in style to guiding a reader through a story. But this still didn’t really land; design isn’t quite the pool where these two merge.
I think that responsive artificial intelligence may be.
When learning the skills associated with writing, you very quickly realize how hard dialogue is to create. It can come across as stilted and false if you aren’t true to common language techniques, but if you are too liberal with speaking styles, dialogue on the page looks forced and over-stylized. Making words on the page that readers enjoy and become immersed in is a difficult task in and of itself. Creating smooth conversations between characters that translate seamlessly in the mind of the reader is a skill that distinguishes the top writers.
Both writing and programming involve a blend of art and science. And that’s where I see Artificial Intelligence. In writing the content for our Alexa Skill, I was forced to truly consider the language we were using. I was able to quickly create lists of potential statements for person and machine to interact with each other through the science of language by using various verbs, writing multiple conjugations, looking for synonyms. But it wasn’t until our first tests that I realized I had to stop and rethink my path. More than finding the correct language, I needed to find the actual language being used.
I was forced to stop and listen, truly hear the actual words coming out of my teammates’ mouths when they spoke to Alexa. I had to add in slang, create incorrect sentences, and use words that were similar sounding, but not actually correct as I wrote for Alexa, and for humans.
I also had to decide how formal I wanted Alexa to be. Her input from people needed to be natural for ease of use, but her responses could still have the formal language of a perfect speaker.
And I had to consider other interactions that Alexa had with my team. If, for our game, I chose an informal voice, would that impact or change the interactions with Alexa? Her average tone when performing pre-programmed tasks is very formal and has the ring of a computer talking to you. Could I make Alexa more relatable by changing the language she used as output? Would a change in tone be difficult for the user to adjust to quickly or would it have the benefit of segregating our game as a special or unique interaction with Alexa?
The best writers are able to create an immersive environment in their writing; you become consumed by the story, fall in love with the characters, see the world they live in and mourn their losses. We, as humans, are already very capable of falling into the worlds in our minds. Those of us who are avid readers understand that you may physically be sitting in a chair in a room with a book in your hand, but mentally can be exploring other worlds. So how do we create that sort of engagement with a machine that speaks to you from the other side of the room?
I think the answer is in the skills of storytelling, of creating the language and carefully considering both the language of the inputs and outputs. The sheer number of potential inputs is intimidating to create, but truly impressive when you look at it as a reflection of natural language.
One of the battles with creating an immersive environment with Alexa is the ‘computerized’ tone. I wonder whether or not language can overcome the currently unnatural feel of the voice. Artificial Intelligence has to appeal to humans, and vocalizations are so ingrained in our existence that until we find ways to create sound from AI that we hear as more fluid, we may have a difficult time creating the opportunity for people to ‘fall into’ an experience.
I wouldn’t be opposed to trying to make that happen, though. And I can’t wait to write more content for AI. Hearing the words I created spoken back to me was thrilling and an incredible learning experience. To be able to see how skills I have developed in two completely different industries could combine to create something new and unique was an incredible opportunity.