Chatbots, from Data

INNAAS creates chatbots from data with the help of App-Quality

The Italians do not like Chatbots”, so titled an article by Wired a few months ago, where referring to a study by Amdocs, highlighted the strong dissatisfaction of users to the Chatbot mainly due to problems of understanding (75%) and reduced ability to answer multiple questions at the same time (47%).

The signs are already evident: several large US retail brands, among the first to invest in this technology, have come back , in favor of the more classic telephone assistance or via email.

To ensure the effectiveness of a chatbot it is therefore essential that this is able to identify the topics that customers need to answer and that is able to understand their needs, all without being influenced by the linguistic structures of the interlocutors. To achieve this result, the chatbot is subjected to a learning process: client and developer identify the topics and their keywords, estimating what will be the requests of the interlocutors and the way in which they will be placed.

But precisely because they are estimates created by a closed group, they are generally not exhaustive and, once the chatbot is public, the learning phase goes live. Thus all the topics that had not been budgeted are discovered, but which users find useful and seek. Then a new training process begins: new keywords and answers are added, all while the product is live. It happens that the chatbot, in the first interactions, offers a low level of user experience, an element that can lead to frustration, annoyance and even abandonment for early adopters . These users usually represent the most loyal customers, on whom there has been a long and expensive work of custom relationship that risks to fail due to a product too immature to be placed on the market. An incomplete training chatbot is likely to break down the client’s retention, thus creating the need to anticipate the learning phase, to make the chatbot effectively ready and competent as soon as it is released.

How to get on the market without being a trainer? And how to succeed in this process in time, without prolonging development?

From this need arises the design that AppQuality has designed together with INNAAS , to support the process of developing a chatbot making it more effective, quick and complete before its public release. We started from the problem (downstream learning stage, when the product is on the market) to identify the development processes that could be revised to improve the learning phase of a chatbot in particular. It immediately emerged that there was not just one single, specific aspect on which one needed to work, but it was necessary to integrate crowd-testing at different stages of development from the identification of arguments to the management of lexical forms.

First, it became clear that the activity of identifying the topics to which the chatbot should have promptly responded had to be strengthened . This phase is usually entrusted to customer care analysis and personal experience of the developer, who together draw up a series of FAQs based on the questions they expect to receive. However, especially for a new product, it is difficult for the inventor, with a deep knowledge of the tool, to succeed in putting himself in the perspective of an external customer, who approaches the service for the first time and therefore needs information.

It was therefore necessary to identify what customers would have asked and to do so it was necessary to understand what would have been the users of this chatbot, and therefore of the service offered. The customer’s marketing department shared an estimate of demographic distribution, cultural level and technological compatibility of its customers. AppQuality, within its crowd of about 100,000 testers, then identified a 25-meter cluster that replicated the typical customer and asked them to interact with the chatbot, still in an embryonic phase . Specifically, in this first test, the tester was told to imagine having seen the advertisement of a new bank and to want information on its services. The objective was to understand what the user might want to ask his bank.

The bank and the developer had already tried to anticipate the requests to which the chatbot should have answered, identifying 187 topics, born from the experience of customer care and marketing department. The output of the test has actually shown that this was still immature and limited in the topics, have been identified well 132 new FAQs to which are added 13 processes already identified but for which users asked for details not yet prepared. In percentage terms it means + 78% of topics identified with the support of crowd-testing in three days of testing!

As expected, it was not in the expectations such a significant increase in the topics, which involved a considerable commitment by the developer and the banking customer to integrate the new questions and write the answers, for each of the 132 topics. If this activity, as it usually did, occurred after the release of the chatbot, it would have been of greater management complexity, requiring continuous support and interaction, with a much higher cost in terms of time and resources.

Once the answers to the questions identified by the testers were inserted, the objective moved towards the correct identification of the topic . It is not enough to know that many ask for information on credit cards.Fundamental to a better user experience is to correctly identify the questions on this topic, minimizing the non-response or misunderstanding on the part of the chatbot.

Then identify the topics and structured the standard questions for each of them, the second phase of training has focused on identifying more lexical forms and logical constructs for each question. In fact, there are those who use a more colloquial language (“hello, how do I get a credit card?”), The more concise (“credit card request”) or who is more verbose (“if I open the account with you in how many months can I have a non-rechargeable credit card? “). It is evident how flexible the chatbot must be in managing different languages, but also effective in identifying the macro-topic: in this example, the process of activating a credit card.

AppQuality then took the set of pre-defined questions and asked their crowd , previously identified so as to be similar to the bank’s client, to rewrite them, each tester with their own style. In this phase it was necessary to reach a high number so as to cover the largest possible number of styles, lexical forms and use of synonyms. In this second phase, in less than a week, 50 testers stressed the chatbot and defined more than 8,000 different interactions, of which 5,000 were subsequently validated and integrated into the chatbot learning.

This allowed to build an information branch of, on average, 25 different logical structures pertaining to each single topic, allowing the chatbot to compare the request received with one of these structures to correctly identify the request and select the corresponding answer.

It seems clear that it will not be possible to put the word “end” to the learning phase of a chatbot, it is a continuous balance between the objective of satisfying the most probable questions and the need to manage the specificities that may emerge. The chatbot must therefore be limited to be effective, the support of the crowd allows on the one hand to move these limits, on the other hand to ensure, at the same level of competence, a greater ability to respond.

Therefore, anticipating the support of the crowd within the development process, since the embryonic version of the chatbot, the client has succeeded in:

  1. to identify with greater completeness and reasonable accuracy almost all the questions to which the chatbot must be prepared to respond;
  2. to make the understanding of the question more independent of its structure, of the lexical form and of the use of synonyms proper to each person.

How do you train a chatbot? And how do you balance the need for completeness of answers with that of going out on the market as soon as possible?

Integration, collaboration and crowd. The flexibility of the management structure of its community has allowed AppQuality to manage a specific project, at first impact away from the classic concept of testing, but in reality very close to the goal of each test: to release a quality product on the market. The access to such a large number of testers has made it possible to identify a cluster that can be completely overlapped to the typical customer of the bank, and to activate large numbers in a short time, so as to share the value output in a few days with the developer.

Can a user experience analysis of a Chatbot be done?

Yes. It is a declination of the classic UX or User Experience, which is not focused on the graphical interface but on the textual one, where the experience is mainly given by the content and not by the container. This in general, but there are chatbots with graphical user support interface, interface created to speed up the interaction, such as. that of Milan Airports

In the case of the bank chatbot, this was able to provide other information, such as those on the weather. The evaluations that emerged from our tests highlighted several design problems of this feature that provided a visual, synthetic result with an indication of temperature and humidity. The experience of this feature was criticized because the image used was very simple and not in line with the overall style, the icon did not make clear whether what was seen was the current weather or the forecast; moreover, the indication of humidity was not very useful, many would have preferred to have an indication of the probability of rain.

What did we learn?

Crowd testing can also be used as a training tool for a chatbot, but to make its use effective it must be integrated into the development process and the objectives for each stage of progress must be precisely identified, as well as analyzing in depth the feedback shared by tester . Testing a chatbot at the end of the development is indeed feasible, but not very profitable because it would involve a significant re-work and integration of information and logical structures, at that point, given for definitive.

The integration of the crowd, as well as improving the final quality, also reduces the release time : the development focuses on the technical part without devoting to the management of information, which are instead assigned to the crowd that provides the output to work on. Reaching 8,000 iterations in 3 days is not actually feasible for any company structure that would like to carry out a similar activity within it. But even assuming that we want to try to do so, it should create a support structure to memorize and rationalize this information and should implement an appropriate management policy for individual users. The advantage of applying to AppQuality lies in the possibility of outsourcing a complex, time consuming and extremely costly process in terms of information management; obtaining a clear, organized and structured output to be effectively integrated into business processes.

Muhammad Ali applied, during a meeting, a movement that he used to try during training.


INNAAS is the startup leader of AI-Driven Agents (AIDA) that can let businesses interact with their customer in a smarter way. Want to know more? Just a mail away:


Post orginally published by our friends @ for more information about crowd-testing service