Video chat app with real-time translation

After Skype announced their real-time translation I decided that it would be great if this kind of functionality could be embedded into any app built using Voximplant. Since we already have built-in real-time Speech-to-Text [powered by Google] for our developers and our cloud application engine [VoxEngine] allows real-time processing of recognition results it wasn’t hard to add some additional JS code to translate the results on-the-fly [using Google Translate API] and send the translation result to client side for further visualization. More of that, Voximplant offers Text-to-Speech that can easily replace visualization with voice [to say the translation result] — we will make it optional in our app.

Architecture

We will be connecting participants directly using peer-to-peer mode for the best audio&video quality, and at the same time they will have active call that streams audio data to server for recognition and translation, this call will be used if we generate speech in our VoxEngine scenario.

Google Translate API will be called using httpRequest in VoxEngine scenario after we receive recognition result from the ASR module.

Backend Setup

The setup is done in Voximplant Control Panel:

  1. In Applications create new app and name it babelfish (yes, we are building our own babel fish 8-)
  2. In Users create two users with usernames test1 and test2, assign them to our babelfish app [their full logins will look like test1@babelfish.accountname.voximplant.com]
  3. Let’s write VoxEngine scenarios in Scenarios that will be processing calls from/to SDK, we will need 3 of them — gatekeeper, conference and p2p. Gatekeeper scenario connects outbound calls from SDK with conference, handle speech recognition, translation and looks as follows:

4. Conference scenario is a proxy that is used to transfer data between 2 participants:

5. And the simplest last one scenario — for connecting P2P audio&video calls:

6. After all scenarios are created we have to connect them to the application using Application Rules , required to explain the platform which scenario should be chosen to process a call when it arrives. We will need 3 app rules:

Application Rules

InboundFromSDK rule forwards a call from SDK to our gatekeeper scenario for processing. You need to specify the Pattern and move the gatekeeper scenario to the Assigned list as on the image below:

FwdToConf rule forwards a call from gatekeeper scenario to our conference scenario (when callConference is called). Pattern: conf_[a-zA-Z0–9]+

P2P rule forwards a call from SDK to our p2p scenario. We want this rule to process all calls when number doesn’t match previous rules’ patterns. Pattern: .*

Please note that the order of app rules is important, you can use drag and drop to change the order and don’t forget to save your settings after that.

Client Application

After we finished backend setup we can now build our client app using Web SDK to finish the whole thing.

Usually, I use something like React JS + TypeScript + Webpack to build web apps, but to keep the example simple I won’t use it, just good old jQuery and vanilla JS :)

Client Application

You can find the source code for the demo client app at https://github.com/voximplant/babelfish-client-app

If you want to try it in action just open https://demos02.voximplant.com/babelfish/ in Chrome or Firefox.