Phong Vu
Phong Vu
Aug 29, 2018 · 21 min read

Wouldn’t it be useful to make communication content, in the form of audio, searchable for what is said in it, analyze it and extract actionable insights which help us quickly understand and easily navigate to critical moments within a conversation?

In this blog, I will walk through the necessary steps to build a Web app that can analyze call recordings and voicemail content to extract text and actionable insights. It’s about automating the process of deriving meaning from vast quantities of content, which would be impossible with purely human involvement — but possible by using artificial intelligence and applied machine learning technology available on the market like IBM Watson, Google Cloud Platform or Haven OnDemand platform. Once we’re finished we’ll have an app which will:

  1. Transcribe speech-to-text from call recordings and voicemail messages.
  2. Classify call content into predefined set of classification categories.
  3. Index call content with extracted metadata to enable advanced search.
  4. Allow users to search for any spoken word or phrases from call recordings and voicemail messages.
  5. Allow users to fetch call recordings and voicemail by a caller’s number, a callee’s number or by an extension number.
  6. Allow users to list call recordings and voicemail under the same categories.
  7. Allow users to search for call recording and voicemail with sentiment as positive, negative or neutral.
  8. Play back a call recording or a voicemail with stylish text synchronization.
  9. Allow users to interact with a call recording or a voicemail from the transcript by clicking on any word from the text to fast-forward or rewind to the selected word.
  10. Highlight positive and negative human opinions in the speech.
  11. Extract meaningful entities from a call recording or from a voicemail and allow users to see summaries of famous people, famous places and famous companies mentioned in the transcript. And easily navigate to related information from Wikipedia.
  12. Allow users easily reply to a voicemail message by click-to-dial.

This demo application is built using Node JS, Express Web application framework. Thus, for conveniences, I will use the Node JS SDKs provided by RingCentral, IBM Watson, Google Cloud Platform and Haven Ondemand Platform to access their services. You can easily build this Web app backend in any programming language you like with the client libraries provided by the services providers mentioned above.

Project’s source code

Code snippets in this blog are just for illustration purpose. They are shortened and incomplete. In order to follow the course of the application development, you may want to download the entire project source code from our GitHub repository.

If you want to build your own app using the source code, remember to use your own service access credentials to access RingCentral platform, IBM Watson, Google Cloud Platform and Haven OnDemand platform and replace them from the .env configuration file.

Prepare for the content

Before you start, make sure that you have some call recordings and voicemail content with good quality of speech. If you don’t have real content, which is the best data to play with, you can make a few phone calls to RingCentral phone numbers under your RingCentral account, pick up the call and start to record the conversation. You should also make a few calls and just leave a voice message instead of picking up the call. By-the-way, if you are not a RingCentral customer, you can sign up for a free developer account and download the RingCentral soft-phone app for generating contents.

Now let’s have a look at the RingCentral account’s call log database, read call recording and voicemail logs, then collect a few essential call’s metadata to create our own dataset. In this demo, I am interested at the date and time when the call was made, the duration of the call, the phone number or name of the caller, the phone number or name of the callee, and the most important piece of data is the URI of the binary content. Because I am interested at both call recordings and voicemail, I also want to specify the type of the content as ‘CR’ for call recording and ‘VM’ for voicemail.

Dataset with call metadata

To make it possible also for a supervisor to analyze company-wide voice communication, I allow a user with the admin role to access the company call log. This means that the supervisor can read call recordings and voicemail of any extension (any user) under the same account. That is why I want to add the extension number (extNum) and the full name (fullName) of any user to the dataset. Finally, I save the dataset into a local database (in this project I use SQLite).

Up to this point, we have some structured data with the metadata retrieved from RingCentral call log database. After saving the dataset to our SQLite database, we can search for calls made on a certain date, or calls from the same caller’s number etc.

Dataset with extension (user) information and call metadata

The most valuable information of a voice communication is still hidden in the audio binary content, which is the dialogue of a call recording or the monologue of a voicemail. The data in the wave form is human data — an unstructured data which could be understood only when we’re listening to it. Now, without listening to every single call recording or voicemail, a time-consuming task which would take us hours or days to complete — how can we find out what was said in the conversation? How can we identify calls with happy or unhappy conversation? How do we know if our customers left a message complaining about our product or asking us to call them back?

Extract text from speech communication content

The mystery in a wave-form content could be uncovered by using some level of artificial intelligence. And the first necessary step to tackle is to transcribe the speech using Speech-to-Text technology. This sounds complicated and requires a lot of engineering works, right? Fortunately, speech recognition technology is matured and can be easily accessed via on-demand service from many different leading service providers such as IBM Watson, Google Cloud Platform, AWS etc.

There are pros and cons for considerations while choosing a speech recognition service from a wide range of providers on the market. For instance, in this demo, I chose Watson Speech-to-Text instead of Google Cloud Speech-to-Text just because Watson gives the transcript with the timestamp of each spoken word — which can be used to enable the feature of playing back a call recording or a voicemail with stylish text synchronization and letting users interact with the binary content. Otherwise, it would be great to use Google Cloud Speech-to-Text for its great features of providing transcript with punctuations, auto-detect language, which are quite critical for data analytics (please note that at the time this blog is written, I don’t see Google Cloud speech-to-text API supports timestamp extraction but it may support that in the future).

To use Watson Speech-to-Text API, we specify the query parameters for our expecting result as follows:

var prams = {
model: 'en-US_NarrowbandModel',
audio: bufferStream,
content_type: 'audio/mp3',
timestamps: true,
interim_results: false,
profanity_filter: false,
smart_formatting: true,
speaker_labels: true
};

Where, bufferStream is the audio data stream read from the binary content URI from RingCentral call log. Then we call the API as shown below:

var watson = require('watson-developer-cloud');
var speechToText = new watson.SpeechToTextV1({
username: process.env.WATSON_USERNAME,
password: process.env.WATSON_PWD,
url: 'https://stream.watsonplatform.net/speech-to-text/api/'
});
speechToText.recognize(params, function(err, res) {
if (err)
console.log(err);
else
console.logJSON.stringify(res))
}

Upon receiving the response from Watson Speech-to-Text API, we parse the result and iterate thru the alternatives array to create an array to keep all the spoken words and the start_time timestamp of each word.

Because we specified the speaker_labels parameter, Watson Speech-to-Text API result will contain an array of speaker labels. To create a conversation flow identified by the returned speaker labels from the response, we need to match a speaker label with the transcript. This is not straightforward as there is no word associated with a speaker label. Instead, we have to match the start_time timestamps from the speaker_labels array with the start_time timestamps from the alternatives arrays to create a new array containing spoken words of that speaker.

As Watson Speech-to-Text API does not support transcript with punctuations, we will need some mechanism to break a large chunk of text into sentences or paragraphs. In this demo project, for voicemail transcript, I simply rely on the transcript sentence of each alternatives in the results array. And for call recording transcript, I detect when a speaker id is changed to define a sentence. This approach is good enough as long as the speech is fluent and punctuated. For real application, I recommend you use some 3rd party service or better algorithm to create accurate punctuations for the transcript. Or perhaps, drop the stylish text synchronization feature and use Google Cloud Speech-to-Text service, which supports punctuations.

We’ve solved the first problem, a vital important step to transform audio content into text content which is required for data analytics. We also generated some new metadata such as the timestamp of every spoken word and the speaker labels. Our dataset is getting richer now with a few more useful data fields. In fact, we can save the dataset to our database and be able to search for any spoken word or phrase from the “audio” content. We can also implement a user interface to display the conversation separated by different speakers and display stylish text synchronized while playing back the audio content. I will discuss in more details about how to implement that feature later in this blog.

The next step is to apply data analytics for extracting actionable insights from the content and further transform unstructured data into structured data so that we can operate upon the data. This is a very critical step as you need to ask yourself a question of what information do you need? The answer will depend on the nature of the content and your expectation of operating the data. Let’s say you are planning to analyze your customer’s feedback about your products, you may want to use sentiment analysis to analyze how your customers think and talk about the product they purchased. What they like or dislike and what is the level of their opinions.

In this project, I want to categorize the content, extract keywords from it so that I can enhance the search engine — allow users to search for content with similar category and rank the search result. I also want to highlight meaningful entities such as people, places, companies or phone numbers if they are mentioned in a call recording or a voicemail message. One of my favorite data analytics features is to use sentiment analysis to measure human opinions and classify the content based on the polarized sentiments as positive or negative.

IBM Watson includes the Natural Language Understanding API which can be used to identify actionable insights from a document. The Watson NLU API is capable of extracting insights such as keywords with confidence score, meaningful entities, key concepts of the content etc. It can also classify the content into predefined categories. Alternatively, Google Cloud Natural Language or Haven Ondemand Text Analysis services could do the same thing. It’s hard to say which platform is better in terms of quality, performance and price. You can always try them out and choose the one which works well for your data. In this project, I chose the Watson NLU for extracting keywords from the content and categorize the content. My choice is not based on the quality nor the pricing factor, but it is based on the performance in term of convenience because I can specify several features and make just one API call to get the result. While with Google Cloud Platform or Haven Ondemand Platform, I must make separate API calls for different features.

Let’s call the Watson NLU API to extract keywords from the content and classify the content with predefined categories to enrich our dataset.

var params = {
'text': text,
'features': {
'categories': {},
'keywords': {
'limit': 100
}
}
}
nlu.analyze(params, function(err, response) {
if (err)
console.log('error:', err);
else
console.log(response)
});

The response of a success API call above will contain an array of categories, which were classified for the provided content, and an array of extracted keywords with confidence score.

Alternatively, for categorization, you can use Google Cloud Natural Language API to classify the content.

const document = {
content: text,
type: 'PLAIN_TEXT',
};
language_client.classifyText({document: document})
.then(results => {
console.log(JSON.Stringify(classification.categories))
})
.catch(err => {
console.error('ERROR:', err);
});

Remember that when classifying your content using Watson NLU or using Google Cloud Natural Language, your content will be classified based on their predefined set of categorizations. Thus, the results from each will be different!

Let’s move on to analyze sentiments of the content. Both Watson NLU and Google Cloud Platform support sentiment analysis API. However, they are different and both give too simple result. For Watson sentiment analysis, you must define a set of target words (max 20 targets). This is suitable for content that you knew it might contain subjects that you want to analyze the sentiment. For instance, if the content is about customer’s feedback of your products and you have a list of products’ names, you can specify your products’ names in the target array then add the sentiment keyword to the features list in the query parameter. Let’s have a look at how it analyzes the following sample sentence:

var text = "The bananas were fresh and sweet. But the grapes were rotten and bitter."
var params = {
'text': text,
'features': {
'sentiment': {'target': ['bananas', 'grapes']}
}
}
nlu.analyze(parameters, function(err, response) {
if (err)
console.log('error:', err);
else
console.log(JSON.stringify(response))
});

If sentiment is found relating to the specified targets, the result will be an array of targets with each object containing the sentiment information in the example response below:

{
...
"sentiment": {
"targets": [
{
"text":"bananas",
"score":0.740724,
"label":"positive"
},{
"text":"grapes",
"score":-0.616188,
"label":"negative"
}],
"document":
{
"score":0.0875428,
"label":"positive"
}
},
...
}

If you want to use Google Cloud sentiment analysis API, you don’t need to predefine a list of target words. The API analyzes sentiment of each sentence in the content. This is one of the reasons why I mentioned earlier that the recognized text should be accurately punctuated. If sentiment is found in sentences, the result will be an array of sentences with each object containing the data in the example response below:

"sentences": [
{
"text": {
"content": "The bananas were fresh and sweet.",
"beginOffset": -1
},
"sentiment": {
"magnitude":0.8999999761581421,
"score":0.8999999761581421
}
},{
"text": {
"content": "But the grapes were rotten and bitter.",
"beginOffset": -1
},
"sentiment": {
"magnitude": 0.20000000298023224,
"score": -0.20000000298023224
}
}
]

You can compare the pros and cons of each API’s capabilities and result, then decide which API you want to use. As for me, I need more than just the overall polarized sentiment of the content and the sentiment score of each predefined target if I use Watson NLU, or the sentiment score of a sentence if I use Google Cloud Sentiment Analysis API. That is why I am considering Haven Ondemand platform because its Sentiment Analysis API result gives me more useful information. Let’s have a look at how it analyzes the same sample sentence above:

var hod = require('havenondemand')
var hodClient = new hod.HODClient(process.env.HOD_APIKEY, "v2")
var request = {'text' : text}
hodClient.get('analyzesentiment', request, false,
function(err, resp, body) {
if (!err) {
console.log(resp)
}
})
// RESPONSE
{
"sentiment_analysis": [
{
"positive": [
{
"sentiment": "fresh and sweet",
"topic": "The bananas",
"score": 0.9203650635837769,
"original_text": "The bananas were fresh and sweet",
"original_length": 32,
"normalized_text": "The bananas were fresh and sweet",
"normalized_length": 32,
"offset": 0
}
],
"negative": [
{
"sentiment": "rotten and bitter",
"topic": "the grapes",
"score": -0.8532963042204732,
"original_text":"But the grapes were rotten and bitter",
"original_length": 37,
"normalized_text":"But the grapes were rotten and bitter",
"normalized_length": 37,
"offset": 34
}
],
"aggregate": {
"sentiment": "slightly positive",
"score": 0.03353437968165185
}
}
]
}

As you can see, this API gives me much more insights. Besides the polarized scores, I can capture the topic and the sentiment in a sentence represented by the original text. Thus, I would easily find out more how people talks about a topic.

Let’s consider what information we want to include in our dataset and how do we use them later. First, I want to add the aggregate sentiment label and the score to the dataset. With the sentiment label in the database, I can search for content with positive, negative or neutral sentiment. And with the sentiment score, I can set the threshold to limit search results or to rank the result based on the high or the low score. Second, I want to find out the highest positive sentiment score and the lowest negative sentiment score, compare them with predefined thresholds (one for positive and one for negative), add them to the dataset so that I can read and display alerts if there is any statement with very positive sentiment or some statement with very negative sentiment in the content. Finally, I want to add the positive and the negative sentiment objects which contain the sentiment, the topic, the score and the original text. With these detailed information, I can highlight positive and negative statements when displaying the text content.

Dataset with sentiment information

Let’s further extract meaningful entities from the content. All Watson NLU, Google Cloud Natural Language and Haven Ondemand Text Analysis support entities extraction feature. Like our previous consideration of choosing Sentiment Analysis API from different platforms, we should consider which one is more suitable for our use case. I will let you run your own tests with your real content and make your own judgement. For now, I choose Haven Ondemand Entity Extraction API over the others because of its wide-range of entity types and providing reference to Wikipedia information source.

var entityType = ['people_eng','places_eng','companies_eng','number_phone_us']var request = {
'text': transcript,
'entity_type': entityType,
'show_alternatives': false
}
hodClient.get('extractentities', request, false,
function(err, response, body) {
if (!err) {
console.log(response)
}
})

From the code snippet above, I specify the entity type to extract famous people, places, companies and U.S formatted phone numbers, then call the API to extract those entities from the transcript. On success, I will add the response containing a list of identified entities to my dataset. The detailed information from an entity differs from each type of entity. For example, a person entity object contains a quick profile of that person such as the name of that person, a list of professions, the date of birth, the image and the link to a person’s Wikipedia page. And a place entity object contains essential information of that place such as the name of the location, the longitude and latitude, the type of the place (e.g. city or country) etcetera. You can learn more about entities information from here.

Create actionable items

This demo project is about analyzing voice communication content, which includes voicemail messages. Some voicemail could contain a message just “for your information”, some voicemail could contain critical requests for taking actions. For example a voicemail might contain a “call back request”. Let’s implement a simple technique to detect a call back request and make it easier for a user to reply such a voicemail.

var callActionDictionary = ['my number is', 'my cell phone is', 'my cell number is', 'my phone number is', 'call me', 'call me back', 'give me a call', 'reach me at']var callHighlight = transcript
for (var term of callActionDictionary){
var regExp = new RegExp("\\b" + term + "\\b", "ig");
if (callHighlight.match(regExp) != null){
var text = '<span class="call_highlight">'
text += term + "</span>"
callHighlight = callHighlight.replace(regExp, text)
}
}
for (var number of phoneNumbers){
var regExp = new RegExp("\\b" + number + "\\b", "ig")
if (callHighlight.match(regExp) != null){
var call ='<a href="rcmobile://call?number=' + number + '">'
call += number + '</a>'
callHighlight = callHighlight.replace(regExp, call)
}
}

First, I define a simple dictionary of call back request phrases. Then I detect if those phrases are found from the voicemail transcript. If a phrase is found, I highlight it by wrapping a CSS class call_highlight around the phrase. Then finally, I go through the list of phone numbers returned from the Entity Extraction API, and enable the click-to-dial on that number. The reason I implement the click-to-dial manually, is because I want to display the callHighlight text dynamically (using jQuery to show or hide) and force the browser to launch the RingCentral soft-phone to make a phone call. One extra thing to consider is that what if there is a call back request but there is no phone number detected? In that case, maybe we can presume that the caller expected a call back on the same number he/she was calling from. So in that case, we can use the “from number” extracted from the call metadata discussed earlier.

It’s time to finalize our dataset with the rest of content metadata.

Dataset with content metadata

We are done with building the structured dataset and the process of metadata generation. Below is some code to create a SQLite database and a user table with defined columns. We use the unique extensionId of a user as the name of a user table.

function createUserTable(extensionId) {
let db = new sqlite3.Database(USERS_DATABASE)
var query = 'CREATE TABLE '+ extensionId +' (id DOUBLE PRIMARY KEY, rec_id VARCHAR(16) NOT NULL, date INT(11) NOT NULL, type VARCHAR(12) NOT NULL, extensionNum VARCHAR(6) NOT NULL, fullName VARCHAR(32) NOT NULL, fromRecipient VARCHAR(12) NOT NULL, toRecipient VARCHAR(12) NOT NULL, recordingUrl VARCHAR(256) NOT NULL, duration INT DEFAULT 0, processed BOOLEAN NOT NULL, wordswithoffsets TEXT NOT NULL, transcript TEXT NOT NULL, conversations TEXT NOT NULL, sentiment TEXT NOT NULL, sentiment_label VARCHAR(8) NOT NULL, sentiment_score DOUBLE NOT NULL, sentiment_score_hi DOUBLE NOT NULL, sentiment_score_low DOUBLE NOT NULL, actions TEXT NOT NULL, keywords TEXT NOT NULL, entities TEXT NOT NULL, categories TEXT NOT NULL)'
db.run(query, function(err, result) {
if (err){
console.error(err.message);
}else{
console.log("table created"))
}
});
}

Now let’s learn how to read RingCentral call log database. I will skip the login to RingCentral account and get authenticated steps. But if you are interested, please read this tutorial or self-study the login part from the code of this project.

RingCentral call log can be accessed programmatically using the call-log API. An account user with the admin role, can read the call log of any extension under the account. Any user with the standard user role can only read his own call log. This why after a user logs in his RingCentral account, I read the user information and detect his role to decide if he can read all extensions’ call log or just his own extension call log.

platform.get('/account/~/extension/~/')
.then(function(response) {
var jsonObj = response.json();
if (jsonObj.permissions.admin.enabled){
engine.getAccountExtensions(userIndex)
}else{
var item = {}
var extensionList = []
item['id'] = jsonObj.id
item['extNum'] = jsonObj.extensionNumber.toString()
item['fullName'] = jsonObj.contact.firstName + " "
item['fullName'] += jsonObj.contact.lastName
extensionList.push(item)
...
}

To read calls’ information from the call log, I let a user choose a time range when calls were made. Then I iterate through the extension list to read the call log of each extension and detect if there is a voicemail message or a call recording in that call, parse and extract metadata then add them to the database.

var endpoint = '/account/~/extension/'+ ext.id +'/call-log'
var params = {
view: "Detailed",
dateFrom: req.body.dateFrom,
dateTo: req.body.dateTo,
showBlocked: true,
type: "Voice",
perPage: 1000
}
platform.get(endpoint, params)
.then(function(resp){
var json = resp.json()
if (json.records.length == 0){
console.log("EMPTY")
}else {
let db = new sqlite3.Database(USERS_DATABASE);
async.each(json.records,
function(record, callback0){
var item = {}
if (record.hasOwnProperty("message") &&
record.message.type == "VoiceMail"){
// extract voice mail metadata
}else if (record.hasOwnProperty("recording")){
// extract call recording metadata
}else {
// call does not have CR nor Voicemail message
return
}
var query = "INSERT into "+extId +" VALUES ("...")";
db.run(query, function(err, result) {
if (err){
console.error(err.message);
}else{
callback0(null, result)
}
})
},
...

At this point, I might have call recordings and voicemails with metadata stored in the local database so I can read the data to display them on a dashboard as shown in the picture below:

Now I will let the user to manually click the Transcribe button at each item on the list to start the data analytics for that call recording or voicemail message. Of course you can automate this step if you want to. This means that after reading the call log, you can automatically call the function to transcribe the call recording or the voicemail binary content.

After the binary content is transcribed and analyzed, I add the new metadata to the database and update the content list with the transcript, positive or negative sentiment indicator and sentiment alerts. And I also enable the Open button so that the user can click to open and see detailed analytics of that call item.

From the search bar on the dashboard, I add several options such as the Field (all, transcript, keywords, from, to, extension or categories), the Type (call recording or voicemail) and the Sentiment (all, neutral, positive or negative) dropdown list. And I also add the positive and negative sliders for users to specify advanced search. For example, I can select the Transcript field and write the word “account” to the search text field, then select the type as Voicemail and the Sentiment as Positive. I also set the positive score slider to 0.600 and click the Search button. This will search from the database for voicemail messages which contain the word “account” and the message must have the overall positive sentiment with the sentiment score is greater than 0.600.

Now let’s have a look at the detailed analysis view. On the details view, I display the conversations on the left-hand side, and on the right-hand side I display the analytics information based on what the user chooses from the menu bar. In the screenshot shown below, you can see the left-hand side is shown the transcript with speaker labels, and the texts displayed in different colors. Texts in green color were read, and a single word in yellow color is the current spoken word, then texts in gray color are unread. The right-hand side is shown with the transcript highlighted with positive and negative sentiment in green or red color, respectively.

Display stylish text synchronized while playing back the audio

To implement the stylish text synchronization, first, I add an audio player to the html page.

<audio id="audio_player" controls  controlsList="nodownload">
<source src='<%= results['recordingUrl'] %>' type="audio/mpeg">
Your browser does not support the audio element.
</audio>

Then implement the following JavaScript codes:

var aPlayer = null
var index = 0
var mIndex = 1
var wwoArr = []
var wordElm = null
function initializeAydioPlayer(){
wwoArr = JSON.parse(window.results.wordswithoffsets)
wordElm = document.getElementById("word0");
aPlayer = document.getElementById("audio_player");
aPlayer.addEventListener("timeupdate",seektimeupdate,false);
aPlayer.addEventListener('loadeddata', audioLoaded, false);
aPlayer.addEventListener('seeked', seekEnded, false);
}
function audioLoaded() {
mIndex = 0;
}
function seekEnded() {
var pos = aPlayer.currentTime;
resetReadWords(pos);
var id = "word" + mIndex;
wordElm = document.getElementById(id);
}
function seektimeupdate() {
var pos = aPlayer.currentTime;
if (mIndex < wwoArr.length) {
var check = wwoArr[mIndex].offset;
while (pos >= check) {
wordElm.setAttribute("class", "readtext");
wordElm = document.getElementById("word"+mIndex);
wordElm.setAttribute("class", "word");
mIndex++;
check = wwoArr[mIndex].offset;
}
}
}
function resetReadWords(value) {
var elm;
for (var i=0; i<mIndex; i++) {
var idee = "word" + i;
elm = document.getElementById(idee);
elm.setAttribute("class", "unreadtext");
}
mIndex = 0;
var pos = offsetArr[mIndex];
while (pos < value) {
var idee = "word" + mIndex;
elm = document.getElementById(idee);
elm.setAttribute("class", "readtext");
mIndex++;
pos = offsetArr[mIndex];
}
}

Interact with the audio player and instant search function

You can search for any spoken word from the transcript by entering a word into the search text field and click the Search button. If the word is found from the transcript, the media player will fast forward or rewind instantly to that word and continue to play the audio content from that moment. You can also click on any word on the transcript to fast forward or fast rewind to that selected moment. Also from the right-hand side, you can read the sentiment and click on the Goto link to jump instantly to the beginning of that sentence in the transcript. To implement this feature, I add the onclick event to every word then assign the jumpTo() function and passing along the offset timestamp of that word. Inside the jumpTo() function, I simply set the offset timestamp to the audio player’s currentTime. To search for a spoken word and fast forward or rewind to that moment, I pick the word from the search field and find it from the wwoArr (word with offset array), if the word is found from the array, I read the offset timestamp and call the jumpTo() function with the timestamp.

// EJS page
<% for (var n = 0; n < conv[i].sentence.length; n++) { %>
<% var wId = "word" + index %>
<span onclick= "jumpTo(<%= conv [i].timestamp[n] %>)" class="unread" id="<%= wId %>" ><%= conv[i].sentence[n] %></span>
<% index += 1 %>
<% } %>
// JavaScript code
function searchForText(){
var searchWord = $("#search").val()
for (var i=mIndex; i<wwoArr.length; i++){
var word = wwoArr[i].word
if (word == searchWord){
var timeStamp = wwoArr[i].offset
jumpTo(timeStamp)
break
}
}
if (i >= wwoArr.length){
for (var i=0; i<wwoArr.length; i++){
var word = wwoArr[i].word
if (word == searchWord){
var timeStamp = wwoArr[i].offset
jumpTo(timeStamp)
break
}
}
}
}
function jumpTo(timeStamp) {
var value = timeStamp;
aPlayer.pause();
resetReadWords(timeStamp);
var id = "word" + mIndex;
wordElm = document.getElementById(id);
aPlayer.currentTime = timeStamp;
aPlayer.play();
}

Displaying analytics results

From the menu bar on the right-hand side, you can choose to display sentiment analysis, meaningful entities, or transcript with keywords highlighted, or actionable item (click to dial in this demo) or just the plain text content.

You can change the positive and negative thresholds from the sliders to adjust the sentiment score for displaying sentiment analysis.

All the features above are implemented on the front-end using JavaScript to process the transcript and the metadata extracted from the audio content.

Congratulations! Now you should be able to build and further develop the this project with more features if you want to. For example, you may want to extract the concepts of the content using concepts extraction service. Or you want to implement an advanced feature for finding similar content based on the set of keywords found from each content.

Node.js / JavaScript: https://github.com/ringcentral-tutorials/voice-communication-analytics-nodejs-demo

Learn more about our Developer Program here: https://developer.ringcentral.com/

RingCentral Developers

Cloud Business Communications

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade