Basic preview of the virtual assistant at https://robinjs.party.

How to write a modular Alexa clone in 40 lines of JavaScript

6 min readMar 16, 2018

--

For a while, I have been looking around for a decent speech-to-text library which is easy to setup, easy to use, and free. After searching for a while I gave up and started working on another project. Until yesterday, when I stumbled upon this awesome tutorial by Wes Bos.

This article will help you write a basic virtual assistant which will work in Chrome. For this example, you will have to serve your files on a server. I will use a basic preact app for this purpose since it makes it easy to set up a server and it makes it possible tweak some UI afterward without getting a headache. A demo of what will be created here can be found on robinjs.party.

Update: It is not necessary to use a framework like preact, however, I did choose to do this so I don’t have to manually setup a project, bundler, and babel. There is no reason you have to use preact. If you don’t want to use it, just skip the next paragraph and dive right into ‘Create the heart of the assistant’. Thanks for the feedback!

Create a new preact app

Let’s set up a basic preact app with the create-preact-app command. This makes it possible to build a new app pretty quick.

Create a new preact project with create-preact-app.

We will create a new file in the source folder called assistant.js, together with a new directory named skills which we will use to place our skills in. You can find all the code of this article in the robinjs-website repository.

Create the heart of the assistant

Let’s start out by defining the heart of our virtual assistant, which accepts a custom configuration. This makes it possible to create your own assistant with another name and speaks the language of your choice.

The basic class for our Assistant (assistant.js).

The virtual assistant should be able to convert an input to an answer. After the answer is obtained, the assistant presents it to the user. I call these two processes process and say. The assistant starts by processing the input and then say the output. For now, we will make our assistant log his replies and always provide the default answer reply to the user, which we have specified in our configuration.

The basic process and say functions of our Assistant (assistant.js).

Awesome! We can now provide sentences to our assistant and he will output his answers to the console. However, our assistant is still useless. Let’s fix this by adding some skills that can generate dynamic replies.

Create some skills

A skill is a basic set of two functions — one which determines whether or not this skill should be triggered based on input and one that converts the actual input to the answer. The two functions, which I call trigger and resolve, are the only ones necessary in order to create a skill.

In this example, I will make the trigger function synchronous while the resolve function will by asynchronous. One might choose to make the trigger function asynchronous as well, however, we won’t use this type of triggers for now.

Let’s create a skills folder with two files — time.js and whatsup.js . These two basic skills will be used for our assistant.

Two basic skills we will use in our Assistant (skills folder).

The final step of creating the processor is to find the correct skill and call its resolve function. We can do this by updating the implementation of the process function like the snippet below.

Implementation of the process function (assistant.js).

Congratulations! You have finished the heart of your own (text-based) virtual assistant. You can now use the code above to create a new instance and load up your skills.

Basic usage of the text-based version of the assistant (index.js).

Listen and speak

Time for the last part of the implementation of the virtual assistant. This is where we make the assistant speak its answers to us and listens to what we have to say. Let’s start with the easy part, text-to-speech.

We will implement the say function and make it use the built-in API in order to start speaking. From now on our code will only run in the browser since we will use variables which are attached to the window variable. We use the regular expression /[&\/\\#,+()$~%.'"*?<>{}]/g to filter out any characters that the assistant should nog pronounce.

Basic implementation of the say function (assistant.js).

Now that our assistant can speak, it still has to listen to what we have to say. In order to do so, we have to instantiate a webkitSpeechRecognition instance, set its language to the one we have specified in our config, make sure to restart after it has heard a sentence and connect it to our process function.

This might sound like a lot, but it’s not that much at all. We can plug the code to set this all up right into the constructor when we create a new assistant. Note that we now have a callback where we receive a recognition instance, which we should convert to a string and pass to our process function. I have also added a simple start function to start the recognizer.

Initialize a basic recognizer in the constructor (assistant.js).

If you’d like to know more about the recognition variable, you can print the instance in the console in order to explore its structure. For now, I have already written some code to convert the instance to a transcript, which is the sentence which the recognizer has heard.

Once we have the sentence the recognizer has identified, we check if the first word is the name of our assistant. Only if this is the case, we pass the rest of the sentence to the process function. If the sentence does not start with the name of our assistant, we should simply ignore it.

Complete implementation of the conversion of the transcript (assistant.js).

Instantiate a new assistant

That’s it! You can start using your own assistant and keep adding different skills over time. Since every skill has the same API, you woule even be able to distribute and install skills via npm.

Instantiate and start a new Assistant (index.js).

Source code

You can review all the code of this article in the robinjs-website repository. The example I have created here is also hosted on robinjs.party so you can run the demo right away.

Conclusion

This might be an extremely basic version of a virtual assistant, but because of the custom skills, there is really a lot you can do with it. I hope you liked the tutorial and have fun building your own version of Alexa!

If you have questions or you like this project, let me know!

Happy Coding 🎉