Transform Voice of Customer to Insights by using Google Cloud APIs in 30 Minutes.

Buddhist Texts @Tokyo National Museum

Last weekend I spent a few hours and tried out Google Cloud APIs to explore opportunities to discover insights from unstructured data like customer service calls.

My concept is to transform voice to quantified insight, then aggregate those data back as part of 360-degree customer view. It should help us to recreate customer journeys for further marketing actions on customer level.

Google and AWS provide out of box solutions to convert voice records to text through a simple API. This is a short work note to understand how to works with Google Cloud APIs, and how much does those APIs cost.

Use Google Cloud APIs for voice recognition and text analysis

Google Cloud recently added a sandbox feature to try their Speech to Text and Cloud Natural Language APIs through their website, so you don’t have to write any code to try it out.

If you prefer to work with apis, let’s go through following steps in 30 minutes and have some fun.

Setup your Google Cloud SDK and service account

  • Get a free google cloud account (You might need a credit card to active)
  • Setup a project to enable Speech to Test API and Cloud Natural Language API. (quick start link)
  • Now you should get your Google Application Credentials in JSON format. You should see "type":"service_account"in your file. Please only save it in your local and do not upload it to GitHub :)
  • Google documentation suggests to set the environment variable GOOGLE_APPLICATION_CREDENTIALSin your .bash_profile file. I had some authorization issues in my local so I added Google Cloud SDK.
  • Go through SDK Documentation and initialize SDK
  • Update application_default_credentials.json file under /users/yourusername/.config/gcloud with your service account credentials.

Run Google’s sample code and make an transcription request

  • Clone this repo. (Feel free to switch to other languages. I was using the javascript version for my local test)
  • Run npm install
  • Run node quickstart.js
  • You should be able to make an audio transcription request and get the text back.

Send transcripts back to Google’s Cloud Natural Language API

  • Enable Cloud Nature API for your project in Google Cloud Console
  • Run npm install --save @google-cloud/language
  • Add a few lines of code in your previous file, and your quickstart.js should looks like this.
const speech = require('@google-cloud/speech');
language = require('@google-cloud/language');
const fs = require('fs');
const speechClient = new speech.SpeechClient();
const nlpClient = new language.LanguageServiceClient();
const fileName = './resources/commercial_mono.wav';
const file = fs.readFileSync(fileName);
const audioBytes = file.toString('base64');

const audio = {
content: audioBytes,
const config = {
encoding: 'LINEAR16',
sampleRateHertz: 8000,
languageCode: 'en-US',
const request = {
audio: audio,
config: config,

let transcription = '';

.then(data => {
const response = data[0];
transcription = response.results
.map(result => result.alternatives[0].transcript)
console.log(`Transcription: ${transcription}`);

.catch(err => {
console.error('ERROR:', err);

function runSomeNLP(transcription) {
const document = {
content: transcription,
type: 'PLAIN_TEXT',

.analyzeSentiment({document: document})
.then(results => {
const sentiment = results[0].documentSentiment;
console.log(`Running NLP for Text: ${document.document}` + `Sentiment score: ${sentiment.score}` + `Sentiment magnitude: ${sentiment.magnitude}`);
.catch(err => {
console.error('ERROR:', err);
You should get text with punctuations and an overall sentiment score.
Console Output ->
Hi, I'd like to buy a Chrome Cast and I was wondering whether you could help me with that.
Hulu which color would you like? We have blue black and breath
Let's get the black one.
Okay, Chris, would you like the new concur fault remodel or the regular Comcast?
regular Chrome Cast design
Okay. Sure. Would you like to ship it regular or Express?
Express, please.
Terrific. It's on the way. Thank you very much. Thank you.
Sentiment score: 0.10000000149011612
Sentiment magnitude: 2.4000000953674316

What’s next?

LanguageServiceClient() provides following APIs to analyze unstructured text. Since Google provides punctuations in the transcript, it is not hard to breakdown transcript and run sentiment analysis on the sentence level.


  • Entity Analysis: Identify entities and label by types such as person, organization, location, events, products and media.


  • Syntax Analysis: Extract tokens and sentences, identify parts of speech (PoS) and create dependency parse trees for each sentence.


  • Entity Analysis: Identify entities and label by types such as person, organization, location, events, products and media.


  • Entity Sentiment Analysis: Understand the sentiment for entities identified in a block of text.


  • Content Classification: Identify content categories that apply to a block of text.

Price and Cost

Google Cloud Speech-to-Text API costs about $1.44 USD per hour ($0.006 USD / 15 seconds voice record).

Google Cloud Nature API charges based on characters. If customers or agents speak 150 words per minute, then one minute record contains about 675 characters (Set the average length of words to 4.5 letters), the per hour costs for NLP APIs are between $0.02 to $0.08.

For a small company which receives 100 calls monthly (10 minutes per call), the cost for this firm to use speech-to-text API is around $24. The cost to process transcripts to sentiment analysis API is only about $1 to $2.

Google Natural Language API Price (1000 Units)

Bottom Line

Google Cloud Speech-to-Text API is definitely a great tool to convert voice to text for further classification and analysis. As always, double check with your legal & compliance team before using any real customers data.

Reference: Google Cloud AI products