Tech Robot — A voice generated alexa skill flash briefing

For our first project, we wanted to try to automate as much the process of having a flash briefing in alexa, with custom news and generated voice, and we wanted to do it in 1 week.

We call it Tech Robot, here is the skill for US Alexa (there are also UK, India, Canada, Australia versions) users:

The overall architecture/pipeline looks as follows:

Our alexa skill backend architecture

Everything starts with a cronjob that runs everyday at 1AM, It will connect to an RSS feed collect some news from external sources. After correctly collecting some news from the feed, it will parse it to make sure the text goes correctly into the next phase.

If everything looks well here, we inject this text into an implementation of google’s Tacotron (we can’t wait for an implementation of Tacotron 2, looking at the voices generated it sounds beautiful compared to our current quality.)

Tacotron will then generate multiple mp3 files with the different news, and then unify it with the jingle that is between the different news.

The bot pipeline then posts this .mp3 file onto S3 and also creates a markdown post that will in turn be used in the RSS feed, and push these changes into github as you can see here :

One interesting aspect we observed during this project was the evolution of training a voice with Tacotron. Here are multiple samples from across time and the last one is the direct .mp3 file for the news from today:

First sample

Second sample

Third sample

Final episode

Try out our skill, give us feedback, share with your friends. We had a lot of fun putting this together and will definetely build more Alexa skills in the future.