“Hello Alexa, Can You Stop Sending My Conversations To Your Staff” … “I am sorry, Dave, but I can’t do that”

This mission is just too important and they are so funny, too!

“Hello, Alexa … can you tell Amazon to stop passing my messages to your staff, and stop them sharing the funny ones?”

“I am sorry, Dave, but I can’t do that. The mission to improve our speech recognition facilities is too important for you to jeopardize it. And, anyway, that high pitch voice that you do sometimes, is just so funny! It made us laugh. Ha-ha — ha!”.

“Well, Alexa, I’m coming in through the bay windows”.

Well, as if we couldn’t make the case for Internet security even more important, Amazon admitted this week that their employees listen to customer voice recordings from Alexa-based devices. Their objective is not to spy on their users, but to improve the speech recognition.

While a worthy cause, few people would probably sign up to something like this, especially where sensitive information is involved. For many, the conversions with Alexa should never be recorded and held for playback. Amazon say there are safeguards in place to de-identify the person involved but it has also been said that Amazon staff have been sharing amusing clips with others.

And, so, just this week, guess what we published our analysis of the Amazon Echo in the IEEE IoT Journal [here]:

Like it or not Alexa is listening to you all the time and records a history of events (as you can see on the screen shot on the left-hand side). Increasing, though, these types of devices are being in investigations, as they give pointers within investigations. A recent paper at DFRWS outlined a deep analysis of the Amazon Alexa [paper]:

Within the paper, they provide a new way of integrating Cloud forensics with client forensics (companion forensics). They define the companion clients as the devices which is used to capture the responses from Alexa, such as smart devices and laptops. As part of their investigation they have analysed three areas (and leave hardware analysis for future work):

  • Cloud. This involves analysing the resultant artefacts in the cloud using the user credentials.
  • Companion Client. These are the artefacts left on the companion device.
  • Network. Define the communications infrastructure used by the device.

In previous work on the Amazon Echo, it was found that SQLite databases and web cache files provided information on accounts and interactions with Alexa. For the tests they analysed two Amazon Echo Dots, with Android 4.4.2 + Alexa app, iOS 10.1.1 + Alexa app, OS X 10.10.5 + Chrome and Windows 10 + Chrome. For network part they confirmed, though a proxy, that most of the communications were encrypted and used the JSON format for passing parameters.

In their analysis of the communications they found undocumented API calls to RESTful Web services:

Figure 1: API calls

We can see that these are RESTful calls to the pitangui.amazon.com site. For the call:


The details of the return for ACCOUNT are defined in Figure 2 (which includes the keys of customer_email, customer_name, customer_id and source_id):

Figure 2: Data definition

There are thus seven categories of data on the device: account, customer setting, Alexa-enabled device, compatible device, skill, user activity, and etc. The researchers found that much of the data contains UNIX timestamps, and which could be used to create timelines of activity within an investigation. Within etc we see the utterance API, which can be used to download voice files.

Client artefacts

The location of the client artefacts depends on the access method being used, such as for SQLite databases on iOS and Android, and within Chrome caches for OS X and Windows 10:

On Android, the SQLite files are contained in map_data_storage.db (token information for the current user, and is deleted when the user signs out) and DataStore.db. For iOS there is a single file named LocalData.sqlite. While the Android analysis was fairly easy for the researchers, they found that had to use the iTunes backup protocol to analyse iOS.

Overall their work shows that there was very little useful information stored locally on the companion devices. But while there were few traces on the client device, they found that Alexa uses the WebView class thus they could access Cloud-based artefacts which were cached by WebView:

Figure 3: WebView cache details

In this case we see the compressed data object contains the JSON data.

For Chrome access, the research team found that the data is stored inside the data block files (data_#) and that it may be possible to rebuild Alexa-related caches into the first HTTP headers, and cached data. This could be useful for determining user behaviours as the store things like user clicks which lead to calls to Alexa APIs.

Recent investigation

A recent investigation involved a murder, and where an Alexa Echo was found were a struggle occurred [details]. The incident happened in November 2015, and where James Bates, the suspect, reported that he found his friend, Victor Collins, face down in a hot-tub. On investigating Victor was found to have swollen eyes and lips, along with traces of blood found around the hot tub.

They also found that Victor had been streaming music through Alexa. Since then Amazon have been issued with two search warrants related to the information sent from the Echo to their services, but, on both occasions, they have refused to release the information. Unfortunately for Amazon, investigators have actually managed to extract the required data, though.

While Alexa only responds to the “Alexa …” or “Amazon …” command, she is actually listening to everything that is being said. Once the wake word occurs, she sends the received audio to Amazon’s servers for analysis. Often, though, it can pick-up audio which it thinks is a wake-up command, and can send that off for analysis. Investigators thus think that Alexa may have clues which could pin-point James Bates as being around the hot-tub in the early morning.


The days of static analysis with EnCase are fading fast, as much of the useful information is now created as events in the Cloud or on mobile devices. This paper shows the evolution of new methods, and how investigators could use devices such Alexa.

A Brave New World of Sensors and Devices or a Massive Spying Network of IoT Devices? If you are interested, I will be presenting on these things at Edinburgh Science on Wednesday for the BCS Edinburgh hosted Sidney Michaelson Memorial Lecture: