Alexey Aylarov at APIdays Amsterdam
So first of all we can all meet that CPaaS was rather successful thing so far. That market is growing. Companies trying to find their competitive advantages. And we can actually see that a CPaaS in general was rather successful because of a number of factors.
First of all, the high point of CPaaS is largely due to the flourishing of the “new enterprise.” When companies like Uber and Lyft became unicorns, everyone suddenly realized that startups could go to the next level by using cloud communication platforms. When the market began to understand this, demand for CPaaS grew, as cloud solutions enable you to use “box solutions” very quickly to turn a profit.
Second, it should be remembered that CPaaS platforms have always been targeted to developers. And every modern startup always has developers available whom can easily use CPaaS.
Third, clouds are clouds, which means service availability worldwide, scalability, and increased capacity on demand, all without headaches.
And finally, most platforms offer the principle of pay-as-you-go, when you have to pay only for what you use.
New in the industry
The first thing to mention here is Serverless, which has taken the convenience of CPaaS to a new level. Serverless doesn’t mean the absence of servers entirely, but their absence on the client side. In terms of computing resources, this is the same as pay-as-you-go, because the fees are charged according to the load on the computing provider. Another benefit of serverless is that customers have direct access to the platform’s runtime, which leads to less delays and increased reliability.
Another trend are visual flow editors. This is one of the steps towards the business audience, which (most often) cannot code, but can collect the logic of the bot/call center in a visual editor. Approaches to implementation vary slightly (see Smartcalls from Voximplant, Studio from Twilio, FlowBuilder from MessageBird, etc.), but the essence is similar — the customer uses visual blocks instead of code varying their location and connections between them. Some of these editors still allow you to use the code as an advanced feature, for example, our Smartcalls, but the story is a bit different.
Finally, there is the cloud IDE. Of course, this can hardly be compared to a IDEA for now, but easily to the VS Code. If CPaaS gives a developer a powerful tool to work with the code, this developer will most likely be very satisfied. The cloud IDE with a convenient debugger, smart autocomplete, code highlighting, custom styles, tabs, etc. working quickly within the web-interface gets platform extra bonus points for its flexibility.
But the story wouldn’t be complete…
…if it weren’t for AI. Machine learning gives new degrees of freedom to communication platforms, namely:
Speech recognition and synthesis can be developed independently but it is very time-consuming. You can turn to big players like Google or Amazon — their models already recognize human speech very well, as well as imitate it (nod to WaveNet).
Natural Language Understanding/Processing is now the hottest topic in the world of communications. If a business decision is based on NLU, then, as a variant, there is a synthesis of speech: a person answers a call, his/her speech is transliterated, this text is given back to the robot and the robot, in order to react, selects the text of the answer, which again must be synthesized. Automation of this process is available with Google Dialogflow, IBM Watson, and Amazon Lex, just to name a few.
When a call center operator communicates with a customer, it is possible to analyze speech in the background and provide the operator with additional information so that he/she does not waste his/her time. For example, a customer may ask where the nearest ATM is — the system will recognize the question and display the answer on the operator’s screen so the operator can simply read the answer instead of asking the customer to wait.
Almost everyone is interested in this, but it is the most difficult area of CPaaS at the moment because speech has several cultural differences and there are multiple ways to say one thing. Nowadays, many companies analyze emotions using text. There are some solutions, but they cannot be said to be successful, because the analysis of only the text will not go far; obviously, emotions are not only what is said, but how. Therefore, a convincing analysis of emotions in real time is a question for the (near) future.
Everybody knows about noise reduction — when you talk on the phone, a trained model “removes” background noise so that the other person can only hear you. Sometimes, the speaker’s voice suffers because the models are not always able to distinguish between the frequencies related to the background and those related to the voice but on the whole, it works fairly well. Modern smartphones blur the background with the help of AI. This approach, within the framework of video calls, will also be in demand — imagine not having to look for the perfect background, because AI can wash away any environment. Though why “imagine” — Skype already has such functionality.
Analyzing the video stream or videos helps you understand what’s in the frame. This is a very resource-intensive task so it is best handled by those who have a lot of computing power — Google, Microsoft, and other major players.
This includes more than just data classification and segmentation. Imagine if you had tens of thousands of call records that needed to be translated into text and then searched. It is much more effective if AI goes through these records and distributes them into groups (such as sales calls and warranty calls), which can then reveal whether or not the operator followed standard customer service protocols (i.e. how the person behaved, what were the emotions, etc.). You can extract as much information as you like from such a data set with the help of machine learning.
Defining the answering machine
It’s a special case but it’s also a good example: we implemented the definition of an answering machine in our platform. Now, the platform is able to recognize answering machines in Russian. We have trained the model on many calls and it is able to distinguish between a live person and a recorded message. Conventional methods of detection are not very effective (for example, by audio signal), but AI has helped us to achieve accuracy up to 99%, while recognition takes only two seconds.
Machine learning requires a lot of resources, not just computational power, but also people with special skills — data scientists, for example, who create and configure training models, and know what data is needed and when. Such people are not easy to find and their work is expensive. They are in great demand among major players, and competing with Google to recruit is hard but still possible. Therefore, instead of competing with the giants, it is better to collaborate with them. Most CPaaS-players piggy back on the achievements of large companies. On one hand, this means that the larger partner manages the costs of other players and can set or change the rates for recognition and synthesis of speech (remember WaveNet from Google). On the other hand, you use the solutions of the larger player and they suddenly decide to change the rates, then you have to do the same thing which may not please your users. There’s also the additional concern of sending data to this giant which, for some businesses, poses a problem of privacy and security protocols. It is always possible, however, not to depend only on one partner, but use solutions of several with similar functionality. Finally, such cooperation is convenient and beneficial for CPaaS-players.
Instead of concluding
New technologies on the horizon that will affect communications in the same way that WebRTC did in its day are 5G and AV1.
5G is designed to bring to life the principle of “always online” — this is the ultimate goal, but this will not happen overnight. With the advent of this technology, CPaaS will have even more opportunities, because even those who have not used mobile data transmission before will start to do so in the IoT. The communication infrastructure will change, and with it, the usual telecommunications businesses as well.
AV1 video codec will also be useful for CPaaS, as it is free of charge, so you will not have to worry about licenses. The free codec, which is more efficient than H.265 and available to everyone, will also change the world of communications.
The future is unfolding before our eyes, and Voximplant is not just watching what is going on, but also participating in the process.