How to Build up Voice Apps with Frameworks

7 min readOct 30, 2020

And which one to choose for your big voice-first project. Originally published at Just AI blog.

Why do developers hate frameworks? Busting a myth

Some guy once said that real developers do not use frameworks, real developers build them. Though that line may sound pretty cool and motivating, the reality is that frameworks are irreplaceable and indispensable. They provide users with a clear process for turning their idea into code, with no need to know all the ins and the outs.

An all too common myth is that frameworks reduce the possibilities of what can be done and leave no room for control. In fact, frameworks were never a one-size-fits-all solution and can be customized and enriched to fit the specific parameters of each use case. Most frameworks were built on common principles, so there’s no need to reinvent the wheel — it’s a great deal simpler to start with a framework that already built, make sure it really matches your requirements and goals, and fully customize it for your needs instead of starting from scratch. That power of quick and simple customization is the primary benefit of open-source frameworks.

And this applies to every industry, Conversational AI included. For too many developers a voice-first project made from scratch seems effortless — you got a query and then you got an answer, and presto, we have a dialog! When developers dive deeper they see it’s far from the smooth sailing they imagined. Traps and pitfalls are everywhere and voice, they discover, is just as complicated as the other systems they are accustomed to.

Consider this: channels, speech recognition, speech synthesis, and NLU — you’ll have to carry out all of these connectors separately. Needless to say, that will lead to a huge cost increase. Adding a simple website bot can be accomplished with Python devs in-house, with developers choosing any Python chatbot builder and accomplishing the task of building a bot with little effort. But that approach violates the golden rule of development — thinking ahead and planning to develop the project further during the first stage and analyzing multiple outcomes so time can be saved during scale.

Frameworks are great, just make sure you pick the one that is well supported and has a big community. This way you can be absolutely sure that if you ever have a problem, you’ll get help. Because when something is amiss, most probably someone has already faced the same problem. And that means that the problem might already be solved, or a bunch of other users will be down for its solution. Which means, your issue will be much easier to fix.

Why we’ve built our own framework

This is one of the reasons JAICF (Just AI Conversational Framework) was built. The more people there are in the development, the better the tool itself and the products built with this tool become. Frankly speaking, in exchange for the framework usage we expect to get feedback, which will help us to enhance the framework.

When we uploaded JAICF at GitHub we wanted to inform all AI enthusiasts, developers, and companies that Voice Tech has a great promise for the future and it can be implemented in any niche. And we are sharing a free toolkit which can be used to easily test a hypothesis and build up even mission-critical AI-powered solutions.

We are genuinely excited to share knowledge and expertise with people — we want them to educate themselves and learn what conversational AI is. And not at the ‘no-code builder’ level, we want them immersed in the subject.

“sooner or later most mobile apps will have a voice control”

Another idea behind the development was the fact that sooner or later most mobile apps will have voice control. That’s why we chose Kotlin, a programming language widely used by Android developers. Kotlin brings a context-oriented programming paradigm which makes it perfect when it comes to systems where context is crucial — it’s a perfect programming language for conversational software. It is easy to learn, especially when you switch from Java or Swift. We wanted to give a handy tool where developers could quickly put their idea into practice and test it.

How to choose the right framework

Before starting, you must define who’s going to work on the project, what languages you’re planning to use, what language you want for the application architecture, the products you’ll need to plug-in, what performance requirement there should be — all the details are crucially important.

Take into account your technical staff. For instance, you’re planning to implement a chatbot, and most of your developers are Python devs — their duties are closely related to data science, and you got your own models that you plan to use — in this case, you don’t have to worry about optimization, you can simply pick a Python framework.

Another example — if you have a lot of JavaScript developers/front-end web developers, you can find suitable frameworks to simplify their work. If you’re building some field-specific application with a regular need for adjustments and optimizations, then a Kotlin-based framework is perfect.

Another argument for the Kotlin-based framework is that Android is the top operating system for smart devices. So, in case you need to integrate a voice assistant into an Android app, and you got Android developers in-house — just set this task before them, they should have no difficulty creating a voice interface. JAICF is perfect for smart devices and Android Things, as well as any other open-source tech because you won’t need a Pythoneer’s help to create a logic of dialogue. And JAICF is the first and onlyKotlin-based framework to date.

A programming language used to create a framework imposes its restrictions. That means all the programming language features are transferred to the framework. Like, Java and Kotlin are strongly typed languages, meaning more control over low-level logic and better optimization. Python doesn’t have it — it’s not strongly typed, it’s a higher-level language, it has a global interpreter lock, that prevents synchronous interpretation unless you use some kludge.

Working within an advanced framework is easy because you can use almost any component needed. When it comes to conversational AI frameworks, the most complicated and necessary component is the NLU engine, so make sure you pick the framework created by a company that has their own NLU core. Then go to Github to see when the last commit was made to make sure the framework has ongoing support and an active community.

Top Conversational AI frameworks

The best thing about conversational AI frameworks — you can create a wide range of voice-first use cases — ranging from a voice assistant in a mobile app to voice-first games for smart displays or smart TVs.

There are dozens of voice tech frameworks of different levels to address various challenges. The most advanced ones that have big and active communities are:

Rasa Open Source

An open-source machine learning framework to automate text-and voice-based conversations. Rasa is built on Python and has a built-in NLU, so you can use it both as an end-to-end solution or as an NLU server.

With Rasa, you can build contextual assistants on Facebook Messenger, Slack, Google Hangouts, Webex Teams, Microsoft Bot Framework, Rocket.Chat, Mattermost, Telegram, Twilio, your own custom conversational channels, or voice assistants as Alexa Skills and Google Home Actions.

Jovo

The Jovo Framework is built on TypeScript. It allows you to build voice experiences that work across devices and platforms, including Amazon Alexa, Google Assistant, mobile phones, Raspberry Pi, and more.

BotPress

BotPress is an open-source conversational AI platform, built on TypeScript. BotPress is a flexible conversational platform for enterprises to automate conversations and workflows. Although having some nice features like advanced permission, security and data compliance, open-source doesn’t seem to have some features that are useful for enterprise (number of admins, roles, multilanguage, white-label widgets and interface, etc.). It’s mainly aimed at bots not voice, though. Doesn’t seem to be any voice markdown or available channels, mostly text ones.

Deep Pavlov

Built on Python, Deep Pavlov is an open-source framework. It helps you build multi-skill chat-bots with NLU, multi-state support, contexts, etc. You can easily connect other DeepPavlov models to the agent for annotation and evaluation. Doesn’t seem to have any channel support at the moment, but can be used for pretty much anything, although it requires a lot of customization and additional work due to lack of existing connectors.

JAICF (Just AI Conversational Platform)

JAICF is fully customizable; the framework works with popular voice and text channels such as Amazon Alexa, Google Actions, Slack, Facebook Messenger, and more.

Any NLU engine, whether Dialogflow or Rasa is also compatible and JAICF’s use of third-party libraries adds ready to use NLU modules. The flexibility of JAICF allows developers to deploy in the environment of their choice and can easily be scaled with a choice of Kotlin, Java, or third-party libraries.

Wrapping it up

Frameworks are awesome — they save time, they give a free hand to create a unique, custom solution, but most importantly, you don’t have to be a top-level codehead to get started. Frameworks are building sets that embrace all the required beans you may need to put even the most audacious idea into practice.

Frameworks are open-source, meaning they are free even for commercial use. Many of them like JAICF, Rasa, or Jovo allow creating even mission-critical solutions — for free or with additional payment for some features, integrations, or channels. Meaning you can build even enterprise projects with low investments.

Frameworks implement the logic of dialogue, but they don’t limit the development, meaning you can create a skill for a smart device, phone, a simple bot, a game – whatever you want. You could use these tools even to create scenarios for a movie!

Conversational AI is a relatively new technology, but it’s a rewarding one because its future is very promising — chatbots, virtual assistants, and voice navigation are here to stay. Each day we will find new solutions. And you can be the one who brings something to the table! The idea is crucial in conversational AI and you can find one simply analyzing everyday needs, user experience, and client’s expectations.