How to make a Cloud-based chatbot secure?

In this article, I will show you how to address data security risks associated with Cloud-based chatbots. Operating in the Cloud, they offer significant benefits in terms of accuracy and performance over local solutions — but this comes at a cost in the security of sensitive information. How do you supplement them with the necessary degree of data security? Let’s take a look at possible methods and some code!

Igor Fiodorow
gft-engineering
6 min readSep 25, 2019

--

Chatbots have enjoyed a rise in popularity in various kinds of operations where exchange of information is needed. They enable us to execute operations triggered by verbal commands, both written and spoken. Implementation of such solutions offers one crucial advantage over conventional applications — there is no longer a need to create frontend (for example, the app’s GUI), as long as we have a client app for conversations. This can be covered by Skype, Microsoft Teams, Facebook Messenger or even a plain command line. Thanks to that, when creating our app, we can focus strictly on functionalities triggered by words, rather than by the user’s interaction with the frontend.

Hack Yeah 2019 — a massive hackathon where we delivered our secure chatbot challenge

A few words about understanding words

If we omit the frontend, we need to implement a module that processes the incoming textual command into a so-called intent which represents the user’s intentions. For example, the intention behind the phrase “What’s the weather today?” is to learn about the current weather — and the processing engine should be properly trained beforehand to recognise differently phrased sentences containing the same semantic content. Take a look at the following screenshot:

Such services belong to the family of NLP (Natural Language Processing) tools. Usually, these are based on neural networks that learn from instances of particular words in different kinds of intents and contexts. Many vendors offer their solutions in this area — Microsoft developed LUIS (as shown in the screenshot above), Google has DialogFlow (a more robust tool, which also covers the logic of the chatbot itself, including the course of a conversation), while Amazon offers Lex.

To be more specific, the understanding of natural language for the purpose of detecting an intent and its entities is called NLU: Natural Language Understanding, which is a part of the much broader field of NLP techniques for the processing and generation of text and voice.

There exists a multitude of NLP engines (most of which are created in Python) that allow for local implementation, thus providing more data security. On the other hand, local solutions tend to be more difficult to develop in a way that ensures proper accuracy and speed of reactions. How can we then combine the feasibility of Cloud-based chatbots with top-notch standards of data security? Let’s take a look.

A chatbot that respects security

When writing a chatbot that tells you the weather, there is little risk of exposing important information — a city location is not “sensitive data”, after all. However, if you want to develop a bot that works for banks, public institutions or companies that handle sensitive data, you are obliged to preserve this data (which may pertain to clients — especially when you have to comply with GDPR and similar regulations) as hermetically as possible. For example, a chatbot that generates credit requests that uses a public cloud with an NLP engine should not ask for sensitive client information such as social security, ID number or even surname. Such data must be encrypted, stored in the context of a conversation, or parametrised. Deployment of such a solution is not trivial — for instance, a client may write all their sensitive information in one sentence and send it to the bot which, without detecting its status, will send everything to a public NLP engine, which will then understand the meaning — and that means a data breach.

Understanding the importance of this process and the interesting programming challenge that lies behind it, we’ve prepared a task for participants of Warsaw-based Hack Yeah 2019 — Europe’s biggest stationary hackathon, which gathers more than 3000 enthusiasts from all over the world. The winning team from CodeHeroes devised a solution that detected sensitive data and, having replaced it with aliases/parameter names, sent the message to be processed to a public NLP engine. We were very happy to award them with a prize of 10 000 PLN — congratulations, guys!

Marcin Kowalski, Igor Fiodorow & Maciej Nowicki of GFT Poland —the team behind our chatbot challenge

One alternative take is to implement context objects within the chatbot’s logic itself, where all data is assigned to the current user in a given session, and on its basis the user is authorized, authenticated and allowed to generate individual messages. Such a solution is offered by Microsoft’s Bot Framework, which can be connected with Active Directory and uses OAuth 2.0 authorisation, thanks to which logging in can take place from the level of the client app — e.g. Skype or Microsoft Teams, as seen below:

Obtaining information about the user from Active Directory, the bot is able to generate documents and execute tasks assigned to a given role, without its prior processing by the NLP engine. As a result, all sensitive data is stored in the bot’s context object, instead of being sent in messages.

A code example — Microsoft LUIS

Let’s take a look at a method of implementing a secure chatbot. We create a main dialogue which will send the user’s query to Microsoft LUIS and execute the logic after recognising the intent. We aim the dialog class at our registered LUIS endpoint by handing over the login and key in an annotation to the class LuisModel[””,””]

Having transferred the message, LUIS returns the intent that it considers the most adequate to the user’s query. When the intent aligns with one of the annotated intents (see below), a function in the backend is executed in the context of a given user.

Before any function for a given intent is executed, the bot will check if the user has their security token assigned to the context. If such a token exists, a function is called (the second parameter in context.Call()). Should it be missing, authorisation of the person is required:

In the code above, the ConnectionName parameter is a secure connection of the chatbot’s logic with Azure and Active Directory, where authorisation via OAuth 2.0 takes place. The client sees this as follows:

Secure Chatbots save time and effort

Chatbots are a tool of fantastic technical and marketing potential — no wonder they are implemented so eagerly by institutions that operate on enormous amounts of sensitive data. One of our solutions delivered at GFT Poland that I worked on was a chatbot that facilitates the work of DevOps specialists at a major UK bank. Thanks to the chatbot’s secure implementation, the bank’s employees save a lot of time needed for generation of queries, creation of support tickets (as below) and many other tasks.

What’s your take on Chatbot security? Do you have any experiences or questions? Feel free to drop a line in the comment section below!

--

--