Watson Tutorial #1: Speech to Text + AlchemyLanguage Sentiment Analysis in Python


Introduction

Speech and emotion are building blocks of how we relate to each other. When a machine understands speech and emotion, its interaction with us becomes more human. For this very reason, speech recognition and sentiment analysis are two of my favorite machine learning capabilities.

Soon after joining IBM Watson, I was excited to find out that Watson Developer Cloud offers both of these services. Naturally, I jumped at the opportunity to build something on top of them.

The rest of this post will walk you through how I combined Watson’s Speech to Text and AlchemyLanguage using Python. Since the intended audience for this tutorial is developers, I’ll be showing as much code as possible. With that said, if you still prefer to go straight to the code, it’s here.

For more of a step-by-step tutorial, read on!

Edit: Watson AlchemyLanguage has since been replaced by Natural Language Understanding. The capabilities and usage remain very similar.

Step 0: What You’ll Need

  1. Basic Python programming skills
  2. Python (both 2.7 and 3.x will work) development environment
  3. IBM Bluemix account for Watson API credentials (Step 1)
  4. Watson Speech to Text Service and its credentials (Step 2 and Step 3)
  5. Watson AlchemyLanguage and its credentials (Step 4)
  6. Watson Developer Cloud Python SDK (Step 5)

Step 1: Create Bluemix Account

Before we start coding, let’s get the credentials squared away. In order to use the the Watson services, you need to create the services and its credentials on IBM Bluemix. Bluemix is IBM’s PaaS offering that let’s you deploy and manage your cloud applications.

If you prefer deploying your applications somewhere else, that’s not a problem. You can use all the Watson services via our RESTful API. However, the Bluemix platform does give you a very easy way to integrate your deployed apps with all your Watson services. Either way, go ahead and sign up for a Bluemix account to get your credentials.

If you’d like to get a quick tutorial on how to get a simple web application running on Bluemix, check this out.

Step 2: Create the Watson Speech to Text Service

Next, you need to create a Speech to Text service on Bluemix. My preferred way of doing this is through the Cloud Foundry CLI. You can find installation instructions at the repository’s download section. In case you’re wondering, the Cloud Foundry CLI is simply a command line interface that lets you talk with Bluemix directly.

Once you have the Cloud Foundry CLI, use it to login to Bluemix.

After logging in, you can create the Speech to Text service from the command line in a single command. Here, I named my service speech-to-text-standard.

Step 3: Create Speech to Text Credentials

Now that we have a service, we need to create the credentials attached to this service. The easiest way to do this is through the Bluemix dashboard. Once you’ve logged into Bluemix, go to your dashboard and scroll to the bottom of the page. You should see the service you just created under the Services section.

Find your Speech to Text service. For me, it’s called speech-to-text-service-standard.

You’ll see that you currently have no credentials, but that’s easily fixed by clicking “Add Credentials”.

Here’s your credentials!

Remember your username and password, you’ll need it later. You now have the ability convert speech to text, whoo!

Step 4: Register for AlchemyLanguage API Key

In order to perform sentiment analysis on text, you’ll need the Watson AlchemyLanguage service. Since you’re already at the dashboard, let’s try another way to create a service.

First find the catalog tab on top of your dashboard. Scroll down to find the Watson services, and then select AlchemyAPI.

This is an alternative to Cloud Foundry CLI to create Watson services. Since this is a prototype, you can put the service in the dev space and leave it unbound. Bluemix gives you the ability to create different spaces once your deployment flow becomes more complex (you might need spaces such as dev, staging, production etc). Leaving the service unbound means this service isn’t tied to a specific application.

Once the service is created, you should be able get your API key by going to Service Credentials on the left.

Step 5: Python Setup

Once you have your credentials, it’s almost time to code. Again, here’s the repository if you’d prefer reading the code over this how-to guide.

If you’re new to Python, I recommend setting up virtualenv and virtualenvwrapper. These tools combined let you easily set up sandboxed Python environments with everything you need including pip. If you’d rather use something else, that’s fine, but make sure you have pip installed properly.

Now, install the Watson Developer Cloud SDK using pip.

Step 6: Code

The Watson SDK makes it very easy to interact with both services. Go ahead and clone the speech-sentiment-python repository.

Quick note: I use python-dotenv to manage my credentials. Here’s what my .env file looks like:

The only change you need to make to the cloned repository is adding a .env file to your folder, similar to the one above. When you do python run.py you should expect a prompt that says “Please say something nice into the microphone”. The script will then listen for your voice and start recording. Say something nice please! :) The recording will stop after a significant pause. After that, you should start seeing results in your terminal similar to this.

Note: The threshold to activate recording and the time delay to stop recording are all variables that are adjustable in recorder.py. If you’re in a noisy environment, you might need to adjust the microphone settings on your computer and the paramaters in recorder.py.

But running a script is boring, let’s actually understand what’s going on in run.py.

First, we import the necessary libraries. Notice dotenv and watson_developer_cloud.

import os
import json
from os.path import join, dirname
from dotenv import load_dotenv
from watson_developer_cloud import SpeechToTextV1 as SpeechToText
from watson_developer_cloud import AlchemyLanguageV1 as AlchemyLanguage

Recorder is a module built on PyAudio that’ll record your voice into a .wav file. I’ll not go into the specific of how it works in this post but the code is very understandable. Feel free to take a look at recorder.py.

from speech_sentiment_python.recorder import Recorder

At this point, we basically need to performs three tasks in this order:

  1. Record the voice and save it into a .wav file
  2. Transcribe the audio file into text via Watson Speech to Text service
  3. Get the sentiment score of the transcribed text via Watson AlchemyLanguage

record_to_file(), transcribe_audio(), and get_text_sentiment() will accomplish these tasks.

record_to_file(“speech.wav”) saves your voice into speech.wav in the same folder as run.py. Of course you can change “speech.wav” to be a path pointing to wherever you’d like.

recorder = Recorder(“speech.wav”)
recorder.record_to_file()

transcribe_audio() sends the .wav file to your Watson Speech to Text API and gets back the transcribed text. Notice how we can do this with simply one line of code using the Python SDK.

def transcribe_audio(path_to_audio_file):
username = os.environ.get(“BLUEMIX_USERNAME”)
password = os.environ.get(“BLUEMIX_PASSWORD”)
speech_to_text = SpeechToText(username=username,
password=password)
   with open(join(dirname(__file__), path_to_audio_file), ‘rb’) as
audio_file:
return speech_to_text.recognize(audio_file,
content_type=’audio/wav’)

get_text_sentiment() takes the transcribed text and uses the AlchemyLanguage API to figure out its sentiment score.

def get_text_sentiment(text):
alchemy_api_key = os.environ.get(“ALCHEMY_API_KEY”)

alchemy_language = AlchemyLanguage(api_key=alchemy_api_key)
result = alchemy_language.sentiment(text=text)
    if result[‘docSentiment’][‘type’] == ‘neutral’:
return ‘netural’, 0
return result[‘docSentiment’][‘type’],
result[‘docSentiment’[‘score’]

And that’s it, you’re done! You can now record your speech and have Watson tell you how he feels about it.

Step 7: What’s Next

So what’s next? As I mentioned in the beginning, I believe these two services combined make for a very engaging interaction. In my upcoming post, I’ll explore ways to combined this module with hardware to create an unique experience. Specifically, I want to build a candy machine that’ll dispense candy depending on the sentiment of your speech. Stay tuned for the Watson Polite Candy Machine!

I hope you enjoyed this tutorial. If you have any questions feel free to reach out at joshzheng@us.ibm.com or connect with me on LinkedIn.