We know they are listening, but what do they hear?

4 Home AI Assistants Ranked by Data Privacy and How You Can Protect Your Data

Zach Diamond
Analytics for Humans
10 min readApr 19, 2017

--

At Humanlytics, we’re all about data. When our tool has access to quality data, and lots of it, we can uncover business insights that lead to personalized and effective recommendations.

However, in the data age consumers have started to get nervous about the data they produce — data that corporations are collecting.

This has been going on for a while now, but as the Internet of things (IoT) ( the web of physical devices that continually collects data and sends it back to corporate servers) expands into our homes with the advent of mass market “always listening” AI devices, the question of data security has become more and more pressing.

When these devices enter our homes, our most private place becomes subject to a sort of surveillance. This leads to situations in which figures of authority may requisition data from your device for their purposes.

That’s creepy. But, like it or not, every day we are entering further and further into a data-centric world. We all need to accept that the data revolution isn’t just inevitable — it’s already here. But we don’t need to accept it blindly.

So, let’s talk about it. In order to make sure you’re in the know, we’re going to discuss the AI listening devices produced by Google, Microsoft, Amazon, and Apple. We’ll hit on the product itself and what kind of control they give you over your data before talking about the implications.

However, before we talk about these devices individually let’s talk about them generally.

How They Actually Work

AI listening devices pick up their cues through audio commands. The way it does this is by utilizing a “wake word”. It listens to you all of the time but it only starts paying attention when it hears this wake word.

The way this practically manifests is by recording short bursts. If the wake word is detected, the device keeps on recording, and if it’s not, then that snippet is deleted. Once it’s triggered, the device then begins recording or streaming what you say.

At this point, it’s important to understand that this is a two part system. The device in your home doesn’t actually do any of the computing. Rather, through the internet, it is connected to a computer server miles away that does the real work.

This means that your voice recording (or streaming) doesn’t stay within your house, but is rather sent to be analyzed before the correct response is formulated and sent back for your device to spit out in your home.

Google

The Product

Google Home is described as a “Google Assistant” that provides “hands free help.”

This covers a wide array of capabilities that Google separates into six categories: 1) get answers, 2) enjoy entertainment, 3) manage tasks, 4) plan your day, 5) control your home, and 6) have fun.

In order to use it, all you need to do is plug it in, run through setup on an app on your phone, and then, when in ear-shot of the device, say: “Okay Google”. This will activate the device, at which point it will begin actively processing and executing your requested action.

Data it Collects

Google is a bit squirrely about this.

On the Google Home FAQ page on data privacy they confess, more or less, to collecting information you provide to them through your account, your search history, your location history, as well as through third party apps.

However, if you look at Google’s Privacy Policy you can see what they collect a little more in depth information.

This includes, any account info you provide as well as information about how and what you do on their services. This consists of device, log, location, and application information.

Furthermore, they use local storage along with cookies, and similar tools, to collect and store info based on what you do on Google, including how you interact with services offered by their partners.

Your Control

Theoretically, you have total control.

Your recorded voice clips are automatically stored and associated with your account at Google’s data centers which you can then delete manually by going to myactivity.google.com. If you don’t delete them, they stay there forever.

You can also adjust your settings to determine what Google Home can access about your personal preferences and other info (this refers to the “service info” we touched on above).

One instance in which your control is limited is regarding “service related information”. Even after you’ve deleted your account, Google may maintain some of this information and there isn’t much you can do about it.

More broadly, you’re able to decide what data Google collects and how that information is tied to your account. Additionally, you are able to control whether it has access to your search history, location history and app information.

Curiously, though logically, you have no control over your information if another person personally asks your Google Home for it. As a final resort, you can kill the recording capability but this incapacitates the device.

Amazon

The Product

Amazon’s iteration of this product comes in a few different forms. There is the Echo, the Dot, and the Tap. Despite the variation in name, they all do pretty much the same thing.

It completes a task contained into one of the eight categories that Amazon defines: music and entertainment, news and information, questions and answers, help around the house, smart home, fun & games, shopping, and Alexa skills.

Alexa is special in that it utilizes “skills” which are basically apps developed by a third party for use with the voice capability that the platform provides.

Data It Collects

Alexa records and logs all of your requests and questions once they have been recorded. It also processes information from third party services (to which you may have provided personal info) once they have been connected.

While the Amazon Echo works on the “wake word” principle and only streams your voice to the server once it recognizes this word, it also includes a partial recording of what was said just before you spoke the wake word.

Your Control

First and foremost, you have control over your queries. These are all logged in your history where you can delete them individually. Furthermore, by going to www.amazon.com/mycd or contacting customer service you can delete all of the voice recordings for a product at one time.

Since it it possible to accidentally mention the wake word and stream a recording to the cloud, Amazon provides different settings that protect against this. For example, you can restrict listening to only occur when you press an activation button or activate a tone that notifies you when the device begins and ends recording. You’re also able to mute the device though this effectively kills its functionality.

Apple

The Product

Perhaps the original “always listening” device, Apple’s Siri debuted in 2015 and currently offers, as defined by Apple, eight categories of capabilities.

This includes: the basics, staying in touch, getting organized, sports, entertainment, out and about, homekit, and getting answers.

Siri is also noteworthy for its distinction as being the first always listening AI to be present on a phone and operates on all iPhones.

Data They Collect

Siri collects and uses information that is already on your phone such as your name, contacts, and songs.

Also, if you have location services turned on when you make a request, that information is bundled along with your request.

Furthermore, Apple specifies that some features require “real-time input from Apple servers.” To describe this need they give a maps example in which the server needs to know both the address of your destination and your current location.

Your Control

Apple’s position in this race is somewhat strange.

On the one hand, it doesn’t seem to allow you a large amount of control over what Siri can or can’t access. You can turn off location services, other “proactive services”, or turn Siri off altogether.

Furthermore, you can turn off the “always listening” function of Siri so that it operates only on your physical command (activated by holding the home button).

On the other hand, Apple excels at securing your data in a manner that is both anonymous and secure.

For example, rather than associate your Siri queries with your personal account, Apple instead ties them to a random identifier that is assigned to your device and is deleted automatically after six months.

Similarly, whenever Apple sends information from your device to the server it uses “anonymized rotating identifiers” so that your information can’t be traced to you personally.

One last thing Apple does is maintain much of the information Siri utilizes on your phone.

This means that the information from your email, contacts, app usage, and calendar that Siri uses to make suggestions stays on your device and isn’t sent to the server.

Microsoft

The Product

Microsoft’s iteration of the “always listening” AI is Cortana. Described as a “digital agent” that can do a litany of activities.

Microsoft breaks its capabilities up into approximately eight categories: reminders, tracking, communication, calendar, lists, games, finding information, and opening apps.

Data It Collects

If you sign into Cortana using a Windows account, Cortana will collect information from your device, other Microsoft services, and third party services that you connect.

This includes information such as your browsing history, calendar, contacts, location history (which is collected periodically regardless of your interaction with the phone), and, somewhat disturbingly, “content and communication history from messages, apps, and notifications.”

It’s worth noting that if you use Cortana while signed into your Windows account, your recordings are stored and associated with your account.

Your Control

Microsoft allows a fair degree of control. First and foremost, it doesn’t require an account to utilize, meaning you can make queries that will never be connected to your account (though they will still be saved on your browser).

Additionally, you are able to decide what third party services to connect. Furthermore, any time you ask a question that requires the use of more of your data, Cortana will ask permission before tapping into it.

Plus, if you change your mind later you’re able to individually manage permissions by going to Cortana’s settings (though you aren’t able to manage absolutely everything).

Cortana also allows you to specifically manage what Cortana knows about you by editing the “Notebook”. However, doing this doesn’t remove associated data in the server. In order to do that, and to manage your voice recordings, you need to visit account.microsoft.com/privacy.

Conclusion

So where does this leave us? AI listening devices are cool but they also potentially cross a line of data collection that many of us would prefer not to think about.

More specifically, they allow for the accidental recording of your voice and whatever it is you may be saying. While web searches require a conscious effort to type and search, a slip of the tongue may wake your always listening AI which will then eagerly record whatever you say.

That’s a little bit unsettling, but the good news is that you have a choice. You get to decide which AI listening device lives in your home and what it gets to hear. Take advantage of that choice and maintain your privacy.

With the help of this article you now have the tools to make that choice intelligently. I’ll even help you out a little bit by ranking these devices by data privacy.

  1. Microsoft Cortana- The fight for the top spot was a close scrap between Siri and Cortana but in the end the granular controls that Microsoft allows you to control regarding Cortana put it on top. This not only allows you to control what data it collects but also ensures that Cortana maintains the ability to function after you’ve made those choices. In short, it allows for a great balance of privacy vs operational capacity.
  2. Apple Siri- In all honesty, Apple does a great job with privacy here. They keep your data anonymous, maintain much of your data on your phone instead of on a server, and delete your requests automatically after six months. That’s all great stuff, but what knocks it down to the two spot is the fact that the control that you have is limited. You only have a few customizability options which makes it difficult to find the balance between privacy and operational capacity that Microsoft provides.
  3. Amazon Alexa- Amazon gets the three spot because, while they do nothing egregious, they don’t do anything great either. Deleting your data can be a chore and while it allows you some control over your data, activate too many of these options and you’ll effectively kill the functionality of the device. Not bad Amazon, but not great either.
  4. Google Home- Google, oh Google. I struggled with this ranking. Like Microsoft, Google allows for fairly granular controls over what data the device can collect and access. However, unlike Microsoft, the Google Home requires a Google account to operate. This gives it access to a whole bevy of information right off that bat and, while you can be more specific about the data it can access, this can also severely limit its functionality.

What We’re Doing at Humanlytics

AI is powerful, but so is data privacy.

At Humanlytics, we get that. Data liberalization is an important tool for the future. It allows for detailed and personalized recommendations as well as AI that will change the way we live. At the same time, however, digital privacy is integral to protecting the values and rights of our society.

That’s why we’re working to ensure that your private data stays that way. Two solutions we’re keeping in mind are algorithmic transparency and minimum viable data collection.

Algorithmic transparency is the idea that the code you write should be available to the consumer. This would allow them complete and total oversight of the data collection process.

While this idea is radical and has several complications, it’s an ideal that we commit to strive towards. Right now, that means we’ll always work to be as transparent as possible about our algorithms for collection/analysis.

As for minimum viable data collection, the idea is simple. We will only collect the data we need and nothing more. This won’t affect the efficacy of our product and the less data we collect the smaller the risk of data abuse. Everyone wins.

Every day of technological advancement is another day during which the consumer’s privacy comes under attack. That’s not great but if you do your part by staying informed and we do our part by responsibly collecting and using data we can each help work towards a future in which privacy remains a reality and not a pipe dream.

Let us know what you think! Fill out this survey to give us feedback: bit.ly/HMLsurvey

Follow us on Medium, Twitter, and our Newsletter, if you want to see more content like this. Reach out to me at zach@humanlytics.co if you have any questions or feedback!

--

--

Zach Diamond
Analytics for Humans

Sometimes I write things, sometimes I make jokes, and sometimes I play with data. ¯\_(ツ)_/¯