First experience with Yandex Dialogs and Yandex Alice

Just a beginning

Jurijs Yuri Bormanis
Reactive Hub

--

This is an article from 2018 with minor grammar adjustments made in 2024. Parts of the information in this article are outdated.

Originally written for a post on habr.ru, this is a slightly modified and extended English version that might also be helpful for newcomers to voice interfaces.

Hi, I’m Alice!

My journey with voice assistants began in the beginning of 2017, when I started playing with Echo devices featuring the Alexa assistant from Amazon while working on an experimental project at the company I was employed with back then. At that time, Alexa developer tools had been available to the public for about a year and a half, but they still couldn’t be compared with what Google and Amazon offer today. As soon as I opened the Yandex Dialogs dashboard, I remembered the strange Alexa interface they used to have and how everything has changed since then.

I’m here to share my experience and impressions of developing voice skills and chatbots in Russian for Yandex Alice (Yandex Dialogs), drawing on the background I have with Amazon Alexa and Amazon Lex.

First. What is Yandex?

In China, there’s Baidu (which dominates 80% of China’s search engine market), and in Russia, there’s Yandex. Yandex is a tech company in Russia that provides various internet services, including a search engine which holds over 50% market share in Russia (with Google accounting for the remaining 50%). Last year, the company released their voice assistant named Alice, and starting from this year, developers can create skills for it.

Simplest diagram how voice assistant works.

The key differences among all platforms lie in the middle part — the platform that recognizes speech, processes input requests, and generates output for the user. The complexity of a developer’s work depends on how simple or complicated the platform is and what it can handle without requiring webhooks or the developer’s code.

My first impression upon opening Yandex Dialogs was that the interface was too simple. I wondered, where are all the tools and how do I control it? It turned out that’s all Yandex offers at the moment.

Target or Intents

The first thing I expected to see in Dialogs was a section labeled “Intents” or “Targets”, each containing sample utterances. Creating such a collection of utterances helps developers understand user expectations. We could create intents like this:

“OrderFood” with these sample utterances:
“fried chicken delivery for tonight”
“I want pizza, make an order”
“all meat pizza with extra cheese on top”

“Help” with these sample utterances:
“I’m stuck”
“help”
“how to use this”

With these samples, we can understand the user and what they’re expecting. For example, if a user says, “I want chicken and fries, make an order,” this indicates that our target (“intent”) is “OrderFood”, so we work within the scope of that intent. Ideally, the platform would learn and understand related phrases. Currently, Yandex doesn’t support “intents,” but I am quite certain that this feature will be added next year.

Entities or Slots

The next important elements are entities (Dialogflow) or slots (Amazon), which are keywords or short phrases. Fortunately, Yandex Dialogs recognize four types of entities: dates (including “tomorrow,” “next year,” etc.), numbers, geolocations (currently, I actively use cities and countries), and names. This significantly simplifies the developer’s life and provides more flexibility.

When we consider the major platforms, they offer tools to create our own “entities” (“slots”) as well as a wide list of integrated entities, such as a list of airports or types of food (e.g., AMAZON.Airports and AMAZON.Food).

Alexa Skills Kit interface. Just a year ago we changed the text field to add intents and slots. Today — it’s a rich interface with a wide range of tools, frequently updating as well. It won’t be long before we see similar updates in Yandex.

Currently, I’m developing a skill where slots containing seasons, lists of different types of sports, and event names would be very helpful. Unfortunately, I have to manually search for these phrases within the code.

Testing

It’s quite disappointing. Testing is based solely on a simple chat and a window displaying JSON requests and responses. There’s only keyboard input, without support for voice input or output. Additionally, there’s no provision for external testing or any tools for testing purposes.Russian language.

It’s worth mentioning that creating a voice skill and chatbot in the Russian language is slightly more complex than doing the same for an English audience. While in English you say “to France,” “from France,” and “with France,” in Russian, the endings of the word “France” will change in all three cases (you can refer to a Russian declension chart for more details). However, small helper functions can solve such problems.

So

What’s good:

  • “Integrated entities (currently four of them).
  • Webhook (optional?).
  • Support: I sent some requests over the weekend and received a response on Monday.

What we need: (remember, Dialogs is only a few months old)

  • Creating our own entities.
  • Integration with more entities (airports, restaurants, etc.).
  • Intents with sample utterances.
  • Command line tools (CLI).
  • Testing with voice input and output.
  • Everything about testing (beta tests, adding users etc.)

Competitors

Similar to the dominance in the search engine market, there’s only one major player — Google with Dialogflow. Currently, Dialogflow supports Russian language only for one-way speech-to-text, giving Yandex the lead in this aspect.

It’s worth noting that mastering one platform can make it much easier to pick up other platforms.

Waiting for the updates

While developing a skill over the past few weeks, I noticed several changes in the Dialog interface; they just need to keep up the pace. Alice and Yandex Dialogs are still young compared to the entire market of voice devices.

--

--