Building an AI Chat-Bot With Node.Js and Web Speech API

Sajid Ansari
The Startup
Published in
7 min readJul 16, 2020

Short Summary :- Using voice commands has become pretty ubiquitous nowadays, as more mobile phone users use voice assistants such as Siri and Cortana, and as devices such as Amazon Echo and Google Home have been invading our living rooms. These systems are built with speech recognition software that allows their users to issue voice commands. Now, our web browsers will become familiar with to Web Speech API, which allows users to integrate voice data in web apps.

In this tutorial, we are going to build a simple AI chat-bot. To build this we are going to take three major steps as :-

  1. Using the Web Speech API’s Speech Recognition interface to listen your voice from a microphone.
  2. Send your message to dialogflow agent (the natural language processing platform) as a text string.
  3. Once the AI from the agent returns the reply text back, use the SpeechSynthesis interface to give it a synthetic voice.

Get confuse on what is Web Speech API and dialogflow. Don’t worry i will introduce them step by step while building our project.

Setting Up our Node.js application

To begin, use Node.js to set up a web app framework.Create your app directory and setup your app’s structure like this.

filestructure

Now, run the command to initialize our nodejs app:

$ npm init -y

The -y accepts the default setting, or else you can configure the app manually without the flag. Also this will create package.json file that conatins the basic info of our app.

Now, let install all of our dependencies needed to build this app.

$ npm install express socket.io uuid dotenv @google-cloud/dialogflow colors$ npm install -g nodemon

After installing our package.json file look like this:

package.json

NOTE: Version 2.0.0 renames dialogflow to @google-cloud/dialogflow on npm, along with introducing TypeScript types.

Some of the dependencies might be unknow for you, let me clear out some:

  1. Socket.io :- Socket.IO is a JavaScript library for realtime web applications. It enables realtime, bi-directional communication between web clients and servers. It has two parts: a client-side library that runs in the browser, and a server-side library for Node.js.
  2. UUID :- A UUID (Universal Unique Identifier) is a 128-bit number used to uniquely identify some object or entity on the Internet.
  3. @google-cloud/dialogflow :- Dialogflow is a natural language understanding platform used to design and integrate a conversational user interface into mobile apps, web applications, devices, bots, interactive voice response systems, and so on.

The next step is to instantiate Express and listen to the server in app.js file.

app.js

Our config.env file will have following environment variables.

config.env

Let’s run our server by using npm run dev command. Our server is listed at localhost:5000.

server runnig at port 5000

In the next step we are going to cover all of our frontend code with web speech api

Receiving Speech With The SpeechRecognition Interface

The Web Speech API has a main controller interface, named SpeechRecognition, to receive the user’s speech from a microphone and understand what they’re saying.

Creating our user interface.

The UI of this app is simple: just a button to trigger voice recognition. Let’s set up our index.html file and include our front-end JavaScript file (script.js) and Socket.IO, which we will use later to enable the real-time communication:

index.html

After applying some CSS stuff our interface look like this.

Overview.png

The css code can be found in this repository.

Capturing Voice With JavaScript

In script.js , invoke an instance of SpeechRecognition, the controller interface of the Web Speech API for voice recognition:

script.js

Then, capture the DOM reference for the button UI, and listen for the click event to initiate speech recognition:

script.js

Once speech recognition has started, use the result event to retrieve what was said as text:

script.js

The onresult property of the SpeechRecognition interface represents an event handler that will run when the speech recognition service returns a result.

The result will contain SpeechRecognitionResultList object and we can can retrieve the text in the array.

Now, let’s use Socket.IO to pass the result to our server code.

Real-Time Communication with Socket.io

Socket.IO is a library for real-time web applications. It enables real-time bidirectional communication between web clients and servers. We are going to use it to pass the result from the browser to the Node.js code, and then pass the response back to the browser.

Let instantiate Socket.io in script.js somewhere:

Script.js

Then insert this code where we are listening to the result event from SpeechRecognition:

Script.js

Now, let’s go back to the Node.js code to receive this text and use AI to reply to the user.

Integrating AI with our application

A number of different services and platforms allow for the integration of an app with an AI system via speech-to-text and natural language processing. These include Microsoft’s LUIS, IBM’s Watson and Wit.ai.

To keep things simple for this tutorial, we will make use of DialogFlow, as it provides a free developer account and allows for the easy set up of a small-talk system via its Node.js library and web interface. If you haven’t heard about DialogFlow before, here’s an excerpt from Wikipedia:

Dialogflow (formerly API.ai, Speaktoit) is a Google-owned developer of human–computer interaction technologies based on natural language conversations. The company is best known for creating the Assistant (by Speaktoit), a virtual buddy for Android, iOS, and Windows Phone smartphones that performs tasks and answers users’ question in a natural language.

Setting up DialogFlow

  1. To setup DialogFlow, you’ll need to create a DialogFlow Account.
  2. After creating an account, you would need to create an “agent”. The Getting Started guide illustrates all the relevant details.
  3. Rather than opting for the complete customization method and creating entities and intents, you can just click Small Talk in the left menu.
  4. You can then toggle the switch for the service to be enabled.

5. To use the API with our Nodejs application, you’ll need to go to the ‘General Settings’ page (click on the cog icon beside your agent’s name in the menu) and retrieve your Project Id.

Before You begin

  1. Enable the Dialogflow API.
  2. Set up authentication with a service account so you can access the API from your local workstation.
  3. Click on create service account key page.
  4. Download your josn file credentials and put in your root directory.

Add the following environment variable in config.env file.

config.env

Using the DialogFlow Node.Js SDK

To connect your Node.js app to DialogFlow via the latter’s Node.js SDK, you need to go back to your ‘index.js’ file. You need to initialize @google-cloud/dialogflow.

app.js

We are now using the server-side Socket.io to receive the result from the browser. After the connection is completed and the desired message has been received, you can use the DialogFlow APIs to recall a reply to the message from the user.

app.js

Once dialogflow generates the result, you can use Socket.io’s socket.emit() function to push it to the browser.

Giving The AI A Voice With The SpeechSynthesis Interface

Let’s go back to script.js once again to finish off the app.

Create a function to generate a synthetic voice. This time, we are using the SpeechSynthesis controller interface of the Web Speech API.

The function takes a string as an argument and enables the browser to speak the text:

script.js

In the function, first, we create a reference to the API entry point, window.speechSynthesis. You might notice that there is no prefixed property this time: This API is more widely supported than SpeechRecognition, and all browsers that support it have already dropped the prefix for SpeechSynthesis.

In the next step, we create a fresh SpeechSynthesisUtterance() instance via its constructor. The relevant text that needs to be synthesized is now set. You also have the option to set additional properties including voice that chooses the set of voices supported by the browser and the OS.

Finally, we use the SpeechSynthesis.speak() to let it speak!

Now, we can now use Socket.io to retrieve the response from the server. Once this message is retrieved, you can call the following function.

script.js

We are done! Let’s try a chit-chat with our AI bot!

finalResult

The entire source code for this tutorial is found in on github

Ok guys, we are at the end of this tutorial. I hope you enjoy it. Like this one more tutorial to come. Happy Coding

--

--