Calculating a Conversational AI Model’s Success Rate

Mustafa Durmuş
albert-health
Published in
5 min readSep 14, 2022

Each machine learning model has an accuracy value that indicates how confident it is in its prediction. Although Conversational AI systems use machine learning models; in cases where the user interacts with the AI model through an interface, even performs this by talking, or needs to interact with the model more than once for the action that the user wants to do, this accuracy value alone is not sufficient.

Let’s see how we, as Albert, came up with a solution to this problem.

What is Conversational AI?

Conversational artificial intelligence (AI) is a system where people can communicate with an AI system as if they were texting or talking to a human being in a digital environment. This system contains chatbots or virtual agents technology where people can perform their real-life actions such as getting information or dictating actions on a subject.

Albert is a voice-based conversational AI system that includes artificial intelligence models trained for various chronic diseases. Albert helps patients to take the right dose at the right time and makes life easier for caregivers. At Albert, users can manage their chronic diseases by creating reminders, saving health parameters (blood pressure, pulse, glucose level, etc.), or asking questions about them.

How it works?

Conversational AI systems generally consist of 5 steps:

  1. Speech to Text
  2. Intent Classification
  3. Slot Filling
  4. Dialogue Management
  5. Text to Speech

NLP Service developed by our team communicates with Albert’s mobile application and performs the mentioned steps above.

Albert’s NLP Service Diagram

As soon as the user starts talking to Albert, our mobile application sends the speech data to the NLP service over the socket connection. It converts the user’s speech into text using the speech recognition model included in the NLP Service. The resulting text is given to the natural language processing model, and the model performs intent classification and named entity recognition (slot filling). After that, Albert’s response to the user is determined and the process is completed by returning it to the mobile application.

Calculating Success Rate

Albert calculates its conversational AI success rate using 4 different metrics.

App Confidence

  • Although it has nothing to do with artificial intelligence or natural language processing, this metric has been added to the formula because the system on which the artificial intelligence model works should be stable and provide a reliable interface to the user.
  • App confidence is calculated with the log data in the mobile application. Socket start and close logs are stored for each user. If the app is crashed then no socket close log is stored. For example, if there are 100 application starting logs and 90 closing logs then the success rate is %90.

Speech to Text Confidence

  • Speech to Text model measures the accuracy of the probability that transcription in the recognizer output is either correct or incorrect.
  • Our model can calculate the entire transcript confidence. In addition, the model can also provide the confidence level of individual words in the transcription but we use the confidence of the whole transcript as an overall success rate will be calculated.

Intent Classification Confidence

  • The Intent Classification Model scores potential matches with intent detection confidence when searching for a matching intent.
  • Albert uses both Dialogflow and RASA for different solutions within the product. Using the machine learning techniques included in these two machine learning models, it selects an intent from the available intent list and returns a confidence value alongside it.

Intent Completion Rate

Some of Albert’s intents require parameters from users. For example;

  • registering a drug; drug name and the reminder time parameters are needed.
  • saving a health parameter value; the unit name and value parameters are needed.

Let’s see a real example with a session.

| User Request         | Session | EOC   | Albert Response
| ---------------------| --------| ------| -------------------------
| Albert record my drug| c42a6d44| False | Please tell me the drug.
| Aspirin | c42a6d44| False | When should I remind you?
| 6 pm | c42a6d44| True | Your drug has been saved.

The most important parameter to consider here is EOC. EOC stands for end-of-conversion. The EOC parameter remains false if some parameters are still required for intent completion. When all parameters are complete, the EOC parameter is set to true. After the data is grouped using the session column, the intent completion rate is calculated as %100.

Let’s see another example with 3 different sessions.

| User Request       | Session | EOC   | Albert Response
| ------------------ | --------| ------| --------------------------
| Record my pulse | 56033eba| False | What is that value?
| set up a reminder | 3396af18| False | Can you tell me the unit?
| pre glucose level | 3396af18| False | When should I remind you?
| 2 times a day | 3396af18| True | Your reminder is created.
| good morning albert| 2e1db27a| True | Good morning!

There are 3 separate sessions. The first user wants to save the pulse rate but ends the session without telling the value. Indicates that the intent is not completed and EOC stays as False. The second one wants to set up a reminder and says all the necessary parameters. The last one just says good morning, EOC is defined as true as this intent doesn’t need any parameters. 3 sessions and 2 EOCs are true. The intent completion rate is %66 for this data.

Formula

Albert weights these 4 metrics according to their importance. The end formula is like the one below.

App * 0.1 + Speech to Text * 0.3 + Intent Classification * 0.3 + Intent Completion Rate * 0.3

Weights vary depending on the situation. For example, if the user texting to Albert without using the speech, the speech-to-text part is removed from the formula and distributed to other metrics.

We measure the success rate of our AI models with the formula created using these 4 metrics, together with the feedback we receive from our customers and our team experience. Of course, these metrics can be updated to the needs of the industry using conversational AI.

Thanks for reading, see you in another article.

REFERENCES

--

--