The NewsChat Team: Johnny Coffey, Tom Van de Weghe, Kye Kim, HR Venkatesh and Santiago Gutierrez

Hi Alexa, let’s chat about the news

Why and how we’ve built NewsChat to interact with news

9 min readDec 20, 2018

--

Don’t be surprised if you’ll get a smart speaker for Christmas. Amazon’s Echo and Google Home are wildly popular gifts again this holiday season. They dropped in price, and they’re less offensive than giving a FitBit as a present. Close to half of all U.S consumers will own a smart speaker after the holidays, a recent study says. Talking to a machine used to be a bit awkward, but the smart speaker boom is leading to a growing acceptance of voice interaction.

You can describe smart speakers as a sort of trojan horse for voice assistants, as they are increasingly sneaking into our way of living. Most people use them to listen to music or real-time information like weather forecasts. But almost 7 in 10 people also use smart speakers to interact with news, and herein lies a big opportunity for news providers. Whether you’re a fan or not, speech based interaction with news sources in phones, internet enabled cars, or kitchens is coming.

How to build an Alexa skill

As part of Stanford’s Exploring Computational Journalism course, we choose to work on a project to explore what it takes to build speech interaction with a news service. Our team consisted of three computer science students (Kye Kim, Santiago Gutierrez and Johnny Coffey) and two John S. Knight Fellows (H R Venkatesh and Tom Van de Weghe) . Krishna Barat, founder of Google News, was our mentor.

copyright/design: Tom Van de Weghe

The goal was to build a custom skill on Alexa and test it, and in the process, understand the kind of work needed to build speech interaction with a news service. A ”skill” is Amazon’s way of describing an app of sorts that is built specifically for Alexa devices. The time frame was very limited, as we got only 8 weeks to build it, divided in several phases.

Exploring the field

In the first phase the journalists researched what news organizations are already providing news on smart speakers. Based on design thinking they did several extreme users interviews, suggesting new directions and useful features. The journalists helped guide the team and made sure the product stayed journalistically sound.

The developers started to build the whole Alexa skill by using the programming language Python. The Python web-app was responsible for scraping news data from the internet periodically and curated particular answers to any questions that were asked to Alexa.

The Python scripts ran on AWS Lambda, a serverless computing platform provided by Amazon as a part of the Amazon Web Services. This computing service runs code in response to events and automatically manages the computing resources required by that code.

We developed two main scripts — one was solely responsible for scraping, and the other was responsible for controlling the flow of interactions with Alexa. When data was scraped, it was stored in a MySQL database. MySQL is an open source relational database management system that was also running on AWS (RDS). When a question was asked to Alexa, the Python script receives that request, looks at the MySQL database for relevant articles, and curates a proper response.

copyright/design Tom Van de Weghe

Development process

Each developer in our team had a specific assignment:

  • One would write the website scraping code and the data-gathering infrastructure that allowed us to curate and store whatever we needed from different websites on the internet.
  • Another developer created the MySQL database and wrote the code that allowed our Alexa Skill to curate answers to questions.
  • A third developer wrote additional website scraping code and more interactions to allow our Alexa skill to curate answers to questions. They each worked with each other and coded different aspects of the application at times.

Each skill is managed by Amazon Alexa Development Console, where you can add a new skill, what kind of words from user will invoke each skill, and add slots for specific words to parse (e.g. “politics” in the speech will be parsed as “topic”, and synonym might be “political issue”, etc). In order to accommodate different user invocations, we added many synonyms to topics and reactions the user might give.

In the second phase, the developers were adding custom skill code, so that Alexa could be linked to the information that was scraped, and interact with users in ways we want it to. This was done by writing out different intent handler functions and uploading the written skill code file (in Python) as a lambda function on AWS, which basically link Alexa to the custom skill and returns corresponding responses back to Alexa.

Then the rest of the work was to deal with every possible interaction flow, customize skill code to respond to different contexts. For example, for YesIntentHandler, which responses to the user’s positive affirmation word such as “sure” “yeah” or “okay”, should understand what was the context the user said yes, and respond accordingly.

Meanwhile, the journalists were writing out several potential conversations (voice commands, spoken responses) between possible users and Alexa, and they added inputs to the Amazon developer console which required no coding. We created this Voice User Interface (VUI):

Why we chose for a news aggregator

First, we chose to scrape and present The New York Times as an Alexa skill, which did not exist at that moment. While this worked well, our team wanted the skill to provide more choice to users who are ideologically different. For this reason we decided to scrape AllSides.com — a site that allows readers to read the news from multiple political perspectives. This could create an interesting conversation, we hoped.

However, this too had a drawback: AllSides.com was limited in terms of the sheer number of stories available on their website and the resulting speech interaction on Alexa was frustrating. It was pretty clear that the skill had to offer both breadth and depth.

Our team also considered scraping Bing News, but in the end decided to go for NewsAPI.org, a Python library which gave us access to thousands of news sites and blogs. This gave us much more choice. We settled on scraping 31 of them, and for our final demo (to save time and server space) we whittled that down to five: The New York Times, Washington Post, BBC, Fox news and The Wall Street Journal.

Four types of interactions

The biggest challenge for us was how to construct the interactions. Since each user is unique, there are nearly limitless ways in which he/she would interact with a speech interaction service. Our team considered four types of interactions.

copyright/design Tom Van de Weghe
  1. The first was the source of news. For example, users may ask “What’s the latest in The New York Times”?
  2. A second type involved the topic they’re interested in. For example, “Alexa, what’s the news in politics or sports or entertainment.”
  3. A third type of interaction was geography“What’s the latest in California”?
  4. Finally, a fourth choice could be side — “What’s the conservative opinion on this?”

For the purposes of a demo, our team decided to focus on the first two types: source and topic. But the interaction itself was far more tricky to pull off. The team imagined and anticipated the kinds of conversations humans would have, and coded that into the developer console. Would a user say “What’s in politics?” or “Give me political news” or just “Politics”?

Watch this video to learn how our demo sounds:

(important note: we decided not to publish the skill on Alexa due to maintenance costs that we as students weren’t able to support, once this project was finished)

Some learnings we want to share

  1. We were surprised by the accessibility of app building, given the extensive libraries Google and Amazon have built to allow users to create apps (or skills) like this of their own. That’s why our team was able to create an app like this within a couple of weeks.
  2. We could have easily built any sort of question and answering service. The reason why our code worked so well in Alexa is because we let Amazon do the voice parsing and recognition, and then we simply just structured our “skill” the same way that we would have had we been creating an app, or a chat bot.
  3. Alexa is powerful for voice recognition, but the app/skill itself needs to be as useful and effective when used in other mediums as well. For now, the speech interaction developer resources are limited and it is a domain that just started to have more developer community supports.

Up next: gamification of news?

But probably the key learning is that there’s an enormous opportunity for newsrooms to create more content for conversational interaction, which could help building engagement and trust with news.

During our development process, we soon realized that it’s hard to use existing content for a natural sounding speech interaction. The dialogues with Alexa often seemed too formal, too stiff. The synthetic voice of Alexa didn’t help of course. The news that Amazon will soon introduce new, more natural sounding Alexa voices through neural text-to-speech (like a newscaster voice) is very promising. Listen here how Alexa will read the news.

But in order to build more smooth conversations about news, we believe that journalists should also consider rewriting or creating content specifically for smart speakers, providing a pleasant, valuable and interactive user experience with news. How can we do this? Recently, story games (voice games) have been created for smart speakers. (A good example comes from the BBC’s Inspection Chamber, considered as “conversational radio”.)

design/copyright : Tom van de Weghe

News services could learn from the developers of these interactive story games, we believe, by creating multiple story lines for different queries. This way, it could make people play with news and enable them to interact with machines in the most natural way, by language. A well developed story graph with different nodes, based on good storytelling rules, could also increase the retention of users.

One thing is certain: with this speech interaction project, we’ve only been scratching the surface of what’s possible in the future. Conversational interaction by voice will strongly impact news consumption and will force newsrooms to rethink their content production process.

Voice is indeed the next disruption. But how do we create engagement with a machine that speaks? This question will be a huge challenge for anybody who is developing artificial intelligence. Do you have ideas or comments? We would love to hear from you: tom.vandeweghe@stanford.edu or Twitter Tom Van de Weghe

_____________________________________________________________

“Just imagine…

You’re in your car. Alone, as usual, on your daily morning commute.

And instead of listening to those endless radio commercials with a little bit of news in between, you want to use your voice to get your own, personalized update on the news.

So, imagine you can talk to a smart newscaster who gives you a lot of the answers. Just like you’re talking to someone next to you.

A natural speech interaction with and about the news. Step by step helping you to navigate through the news that sometimes seems so confusing these days.

Helping you interact more with news, and hopefully helping you trust the news better.

And the only thing you need, is your voice and a smart speaking device.

Well, what I just described didn’t exist yet.

Until we, our team, developed it.

And we call it: NewsChat.”

(presentation of NewsChat, text by Tom Van de Weghe, with inputs from H R Venkatesh)

--

--

Tom Van de Weghe

John S. Knight Fellow Stanford University | Research AI, deepfake & Blockchain | Foreign correspondent VRT | Former Bureau Chief USA & China | author | speaker