Privacy and Data Security of Chatbots

Barbara Ondrisek
8 min readOct 27, 2016

--

Why you shouldn’t talk to your chat bot about everything

At the Privacy Week conference in Vienna I gave a talk about “Privacy And Data Security Of Chatbots”. I wanted to point out, that you should consider what data to share with a bot and in which steps data is processed in the background when talking to a bot: How secure are the messenger apps themselves, the connection the data is transferred over, the used NLP and Machine Learning tools and — last but not least — the backend of the bot itself and its database.

Chatbots and Mica

A chatbot is a service that enables you to interact with a service or company through a conversational interface. It’s a computer program in a computer program that can have an (intelligent) conversation with one or more human users. Chatbots are also referred to as “virtual assistants” or “conversational commerce”.

As Facebook announced at their F8 conference in mid April that they are going to open up their Messenger platform to bots I was eager to try their API! So I created one of the very first chatbots on Facebook — and definitely Austria’s first Facebook Messenger and Skype chat bot: Mica, the Hipster Cat Bot.

If you’re interested in learning more about the basics of chatbots, I’ve already written some articles about chatbots: Mica, the Hipster Cat Bot — Four Month After The Launch or Why emoji fit perfectly for chatbots.

However, messengers are widely used and the success of bots pose a question: What about data security and privacy of messenger apps and their chatbots?

EFF’s Secure Messaging Scorecard

We are typically sharing very personal data when talking over messenger apps to each other. Messaging is a private and intimate thing and messenger app providers are expected to keep their user’s data private.

Also the conversation between a user and the chatbot owner are thought not be shared publicly without the user’s explicit consent, but how about security of the platforms?

EFF

In the face of widespread Internet surveillance, we need a secure and practical means of talking to each other from our phones and computers.
Many companies offer “secure messaging” products — but are these systems actually secure? The Electronic Frontier Foundation decided to find out [..] and created the Security Messaging Scorecard.
Version 1.0 of our scorecard evaluated apps and tools based on a set of seven specific criteria ranging from whether messages were encrypted in transit to whether or not the code had been recently audited. Though all of those criteria are necessary for a tool to be secure, they can’t guarantee it; security is hard, and some aspects of it are hard to measure.

The score card is from Nov. 2014 and shows a security score for different platforms. Here you’ll see an extract of the analysis for different messenger apps:

EFF Secure Messaging Scorecard

As you can see on this score card, most messenger programs encrypt the message during transit, but some messengers such as Kik or Skype haven’t even been audited recently.

Some messengers open up their source code to independent reviews. Most of the messengers analyzed by EFF have no possibility to verify the identity of the contact (only Signal and WhatsApp are providing this feature).

Some messenger apps are end-to-end encrypted such as WhatsApp and Signal, meaning that the platform’s server is not “reading” the conversation. Some of these messengers provide an API for bots such as Telegram, Skype, Facebook and Kik. Usually with bots the platform provider and also the bot provider see the conversation un-encrypted and hence have complete access to it.

The only messenger that would receive an A grade from EFF is Signal, but widely used apps such as Skype (300 MAU) and Kik would get very bad grades.

In 2016 Viber also added end-to-end encryption to their service, but only for one-to-one and group conversations in which all participants are using the latest Viber version. Similar criticism comes with Allo, the new AI-based messaging app from Google, having the end-to-end encryption turned off by default. Security by default should be the way to go here, but NLP would not work that way.

Meanwhile the competition for the next main platform for chatbots has started: Facebook, Skype, Kik and others are racing to be the major ecosystem for bots. Every bot platform tries to offer easy integration of bot and a great user experience for the bot’s users.

The paradox is that in messenger apps the majority of conversations are private and personal between two people and bots are now entering this domain.

Your personal data

Bots now enter the domain of personal and private communications. And we see a transfer in terms of data control: From the user to the messenger app provider.

Ceiling Cat is watching you!

Facebook was often in discussion regarding changes in their privacy policy, likewise Messenger, their chat app. You can order your Uber drive through Messenger, buy stuff or plane tickets and pay directly in Messenger (beta in the US)… all very personal data the user exposes to Facebook. All transactions are logged by the platform’s servers, which monitor and log the communication between the user and the bot.

WeChat pay offers a lot of different services.

The same is already happening in China with WeChat and QQ, where people integrate the messenger app far more in their personal life through micro-payments to friends, or paying their electronic bills or rents in WeChat.

WeChat pay offers a lot of different services and became a single medium for all transactions — and Messenger wants to become this for the West.

Cloud-based AI Tools

Data in the cloud

Personal data is worth a lot to Facebook or Google — and messenger platforms were not created initially with focus on privacy.

Chatbots could also analyze data with external tools for Natural Language Processing and intent understanding and usually the data is also not encrypted when sent to tools such as wit.ai, api.ai or IBM Watson, although it might be sent via HTTPS. These cloud-based APIs process users’ input for intelligent analysis and could analyze all your write, critical especially when handling sensitive data such as financial account infos or passwords.

What do bots know?

Usually bots don’t know much about their users initially. Typically it is something like the name and a screen name and maybe additional data.

Overview of User data shared with a bot platform

*) In Facebook Messenger a bot is basically a Facebook app connected to a Facebook page, that has no access to a user’s Facebook profile. But there are workarounds and tricks to match the page scoped user IDs to the Facebook profiles — and then you expose all your data to the bot. So when this connection is made, the bot also could “know” further details about the person such as likes, age, hometown, or the username.

**) The GPS location has to be shared explicitly on request.

Personality Profiles

And this is only what you receive through the APIs of the messenger platform. Think of all the data you send to the bot. It is super easy to create character studies based on the text you send to a program.

A cat with a personality: Grumpy Cat

For instance there is sentiment analysis as the process of computationally identifying and categorizing the mood expressed in a piece of text, especially in order to determine whether the writer’s attitude towards a particular topic, product, etc. is positive, negative, or neutral.

The same concerns apply to other tools used in bots such as speech to text converters, image recognition apps, linguistic analysis tools and others.

Analysis queue of lucida.ai

However, consider also who else might be listening to your conversation — and I don’t only mean the bot developer or project managers: Currently the conversation between bot and user slips through the server of the messenger app, so Facebook or Google also listens to everything you say to a bot!

Bots usually also store contextual data such as a geo location or a state (which data is needed for which step when communicating with a bot?). This could also be a telephone number or other private data — and no one knows whether the data is encrypted before it gets saved to a database.

Emotional reactions to conversations with bots

People, especially teenagers or seniors tend to text with bots more. You design your bot for a nr. one use case, but people start to chat with the bot. Studies show that seniors tend to chat with Siri when they are lonely — the same happens to bots that are capable of conversations.

Users also tend to text with bots like no-one is listening. When Weizenbaum was studying ELIZA he realized that one test person felt ashamed when he entered the room and said: “Sorry, but I’m currently talking to ELIZA!”.

Also an interesting aspect is, that people react emotionally to bots — they love them and tell the bot this, or hate them, and start using foul language. Dependent on this data you receive you can create personality profiles of bot users. So be careful what you write with a bot and what data you expose on platforms.

Chatting with Mica, the Hipster Cat Bot

About the author

Dr. Barbara Ondrisek, aka “Bot Mother”, aka “Ms. Robot”, is an enthusiastic software developer with 15+ years of experience. She worked mostly as a freelancer at web projects (lately George / Erste Bank) or on building apps.

With Mica, the Hipster Cat Bot, a chatbot that helps you discover hip places, she created one of the first chatbots on the Facebook Messenger platform worldwide. Mica was the very first bot on Messenger and Skype in Austria, which led to a listing as testimonial in the official Skype FAQs.

Together with other experts on their field she found the Chatbots Agency.

Further reading

This article was also published on VentureBeat

--

--

Barbara Ondrisek

Senior Software Engineer, Public Speaker, Founder of Women And Code