Why Alexa, Cortana, and ChatBots Won’t Work For Enterprises: 7 Insights to Shape Conversational AI

Conversational interfaces are a new rage and chatbots are their incarnation. Bots are rapidly gaining users and screen time such that it is hard to imagine a world without them. It is almost reminiscent of dot-com days when there was a site for everything even if the purpose wasn’t clear.

A chatbot is an artificial intelligence or a computer program that interacts with a user through text or audio. They are computer programs that use machine learning to pick up and mimic regular human conversation patterns when reacting to spoken or written prompts.

The last decade experienced several chatbots fighting it hard to conquer the world. Think of Amazon’s Alexa, Microsoft’s Cortana, Apple’s Siri, Samsung’s S Voice, and Google’s Home. These tech giants have brought talking devices to our home and now businesses want a piece of it too.

Not surprisingly, hundreds of vendors have mushroomed overnight to build a “chatbot” for everything in an organization. However, an enterprise-grade virtual assistant is expected to do more than play a song, add items to the shopping cart and read the news. Thus, having worked with large enterprise clients on virtual assistants, we learned how chatbots or even sophisticated general natural language solutions [Cortana, Lex etc.] massively fall short of a business user’s needs and enterprise complexity.

Here are some key challenges that conversational interfaces need to overcome to be deployable and usable in a large enterprise setup:

  1. A customizable NLP for every enterprise for its unique workflows and tribal language:

No two enterprises are the same even if they operate in the same industry, sell competitive products and service similar clients. It is surprising to see the deviations even in highly standardized processes like sales, customer service, lead generation, P&L, analytics or just fiscal time. A simple question like “Sales last week” could be vastly different across companies depending on how to attribute sales and the definition of a week.

Each company has its unique business lingo, processes, and structures. We realized that the key strength of an ideal NLP engine for business is not just its ability to train on the breadth [understanding millions of different questions or words] but most importantly the depth [same concept but multiple ways to address it] and gain the capability to understand non-standard terms.

In addition, the NLP engine should be able to handle specific workflows in a conversational manner that deviates from conventional NLP requirements. Think of a sales process, where a rep can ask for contextual next step by specifically either asking for it or using an acronym or shorthand. Unless Alexa, Google Home or Cortana start customizing their training models [which generally don’t work] or have developers directly tinker with the internals of their natural language engine, they cannot handle such requirements of enterprises.

Like hundreds of chatbots, these applications are mostly prototypes of what conversational interfaces can do for organizations. A true solution needs to have a deep and customizable NLP.

2. IT should manage and evolve linguistic models without vendor intervention:

New users and new data is a norm in enterprises. Teams expand and shrink all the time, which brings new users, new terminologies, and new ways of consuming information. Similarly, data loads are commonplace in the enterprise ecosystem. Data feeds bring in new metadata regularly such as new products, customers, metrics, territories, calculations etc. The last thing IT wants is to hear a user complaining about a broken system or a system that fails to work as expected.

Hence, a common ask from IT would be comprehensive admin capabilities to monitor, fix and expand conversational capabilities. A black box approach or building a solution on third-party cloud NLP doesn’t work in this case. The bottom line is that any serious conversational application requires a third-party admin manageable open and extensible NLP.

3. Data security is paramount from the core to the edge of the enterprise:

Well, data security is a known concern in the enterprise and most chatbot makers don’t realize that any questions asked by users via the conversational layer also fall under the security purview. With data breaches on the rise, companies must remain vigilant in safeguarding their assets. Failure to stay in front of data threats will inevitably result in breaches, financial losses, and tarnished reputations.

Consider, reps using customer names, contract numbers, metrics etc. in their queries to an assistant. If the questions are processed in the public cloud for intent, it is a data breach as sensitive information transcends outside of the corporate firewall or private cloud. Most organizations ask their sales reps to VPN in for web and mobile access to applications holding sensitive data. Hence, it is only fair to expect conversational assistants to adhere to the same standards. Therefore, hosting the assistant on a private cloud is paramount for any organization and vendors relying on third-party NLP libraries cannot meet this requirement.

4. Contextual disambiguation is table stakes:

One benefit of a form-based interface is that most data is presented in a pre-defined format hence the users don’t run into duplicate issues. However, many enterprises will have customers with same names in same or different territories, cities with the same name in different states, employees with the same name in different departments etc. While using a conversational interface, the user would mostly not describe all elements that will help the application pick the entity asked for. NLP should be smart enough to deduce from the context and ancillary information of the conversation to pick the right entity or set of entities that would fulfill the ask!

In computational linguistics, word-sense disambiguation is an open problem of natural language processing and ontology. WSD is identifying which sense of a word is used in a sentence when the word has multiple meanings. Typically, chatbots cannot handle this level of complexity or put in an order of magnitude more effort in building custom logic. Third party NLP provides limited options to create security enabled contextual disambiguation restricting the interface.

5. Data is never clean [and complete]:

We worked with a client who had 16K customer names with many duplicate variations. They cannot clean it completely as the data is sourced from multiple internal and external systems, and many times customer names are manually entered so typos, abbreviations and incomplete names are introduced in the repository.

Your data gets dirty through a variety of ways. Here are but a few examples:

  • Duplicate data — A single event is recorded and entered twice into your dataset.
  • Missing data — Fields that should contain values, but do not.
  • Invalid data — Information not entered correctly, or not maintained.
  • Bad data — Typos, Transpositions, variations in spelling, or formatting (say hello to Unicode!).
  • Inappropriate data — Data entered in the wrong field.

For an efficient conversational experience, the NLP engine should act as a data janitor and able to handle these variations. It shouldn’t ask the user for too many clarifications. It is a complex flow and can often make or break the user adoption.

6. User experience is everything:

According to a study conducted by PwC, 17% of consumers look up to a chatbot as an advisor while many others see this relationship unfolding in the form of a teacher, manager or a friend.

Conversational interfaces are empowering as well as inundating at the same time. The ability for a user to ask any question, not bound by elements on a form is liberating. However, it sets high expectations on the application that should now behave like a human. It is a tall order and should be manageable in the application as well as outside.

The NLP should be smart enough to detect intents and take corrective actions. This ability is severely limited when developers are dependent on third party NLP services where creating new intent or modifying basic intents isn’t possible. In addition, human intervention when things are going right or user needs some handholding is paramount for user training and experience. It is not easy to build these aspects when you aren’t working with a homegrown AI.

7. Co-exist with other apps [only conversational isn’t enough]

Any large organization would have a plethora of applications [CRM, BI, Dashboards, reports etc.] and plenty of data. In an enterprise, a conversational app isn’t an end-all application so it needs to co-exist with other applications that users access. It comes with a set of challenges like, data should be consistent across the conversational app and other applications, logic and business rules should match, and report formats should be consistent etc. These result in ancillary requirements like directly access data and logic in underlying apps to avoid duplication, replicate reports or screens format, match data and format in other applications, and sometimes even deliver links to desired reports or dashboards in upstream products.

Laying the Foundations for Enterprise AI

When embarking to build an enterprise-grade conversational interface intended to perform effectively and efficiently over the longer term, growing and flexing along the way as requirements evolve, it is no longer only about the technology but also about how it is designed, delivered and supported. This is particularly true if you are driving for simplicity, scalability, repeatability, and standardization, and this is where suppliers with a broad set of capabilities, experience, and the right mindset can come into their own.

We hope our discussion in this article will be useful to you as you continue to evolve your own Enterprise AI agenda and initiatives that are agile, secure, stable, and easy to manage.