YAiM — Yet Another iMessage Mining

TLDR: Yes, I analyzed my iMessage conversations with my best friend using Python. But I promise there’s more to that.

Guillermina Sutter Schneider
Analytics Vidhya
6 min readJan 26, 2019

--

Before I get to the coding part, here’s a short background story about how I ended up trying to master the messy art of analyzing a conversation on iMessage using Python…

I’ve always been the type of person who doesn’t like making presents. It doesn’t matter if it’s Christmas, your birthday, or if you just had a baby; I just won’t get anything for you. For the record, this doesn’t mean I don’t like/love you. Another thing I dislike is receiving presents. I sometimes feel like I don’t need it or it just adds to the clutter… But I ain’t no stupid girl. If you come with a bag of Haribo’s mind-blowing Goldbären or a good chocolate, I’ll be walking on air.

Why am I telling you this? My best friend turned a quarter of a century old in December and I felt I should give him a birthday present — but I didn’t know what exactly. I flip-flopped between designing something for him in Illustrator and running to the closest bookstore and get the first book I saw. But it came to my mind that if I eventually managed to combine some of my Python and Data Warehousing skills I would be able to come up with some sort of analysis of our conversations on iMessage.

And so it began.

How the heck did you…

…get access to your iMessage texts?!

This took me almost a couple of hours to figure out after reading many articles and blogposts. You can find the most useful pieces here and here.

First off, I synced my texts across all my Apple devices. This is how I did it. After doing so, file named chat.db was automatically created on my Mac. That file is where Apple keeps all your iMessage data. This is the path to the file: ~/Library/Message.

One of the easiest ways I found to access the data inchat.db was by using DB Browser for SQLite (DB4S). DB4S lets you add, delete, and browse records in a database, among other stuff. You can download it here. It’s safe, I promise.

chat.db loaded into DB Browser for SQLite

Below is a brief step-by-step guide if you want to do the same and retrieve the data using DB4S.

  1. Run DB4S
  2. Click on Open Database and load chat.db
  3. Click on Browse Data and select message from the drop-down menu.
  4. The text column contains a preview of all the texts you’ve sent and received. Take a look at that column, identify the texts you want to analyze, and get the handle_id from the sixth column from left to right. Remember that number. You’ll use it later.
  5. Click on Execute SQL and run the following code:

6. Once you run it, you’ll see the output below. Export it as a csv and… voila!

…read the .csv in Python?!

I imported all the necessary packages in my JupyterLab and read the csv.

Wow! That’s a preview of the first four rows of the dataframe. text includes all the messages that were exchanged — yes, that’s Spanish — , is_from_me indicates whether I (1) or the other person (0) sent the message, and Date gives information on when the message was sent. As you may notice, the dates aren’t in a human readable format. I certainly had to change that.

…deal with those dates?!

I would say this was what took the longest. I had to go over a fair amount of blogs, articles, and softwares’ documentation to come up with an easy way of formatting the Date column. You can find the articles that helped me solve this here, here, here, and here.

Long story short: Apparently, Apple uses Mac Absolute Time (MacTime) which represents nanoseconds since 1/1/2001 instead of other types of dates which are Unix based off of seconds since 1/1/1970. After a good four hours of reading and writing code that made no sense at all, I ended up dividing the original MacTime dates by 1,000,000,000 to convert them to seconds. I also added 978,307,200 seconds (31 years) because once the dates get converted to UnixTime, there will be 31 years missing (1/1/2001 – 1/1/1970). Here’s the step-by-step:

…start analyzing the texts?!

Before starting to analyze the texts I had to make some adjustments to the dataframe: identify senders (G & L), drop unnecessary rows and columns, etc.

I went ahead and created a column for the number of words in each message, and another one which included the number of character in each message. I also changed the words to lowercase to avoid double-counting.

With the dataframe ready, I started doing some exploratory data analysis. I grouped the dataframe by the is_from_me column to see which of the senders used more characters and words. This is how I coded it:

On which days have the most messages been sent?

What are the most common messages sent?

I then created a function that calls for any word as input and returns the number of times each of the senders said it. The first function — iWantToCountWords — asks the user to type in a word. The second one — count— counts and plots the number of times each sender said the word.

But what if one wanted to count and plot a word within a certain timeframe? First, I defined a function that calls for a word, a start date, and an end date as inputs. That function is iWantToCountWordsInTime which returns a second function —countDate — that does the counting and plotting. For the second function I used a mask to filter the g and l dataframes by the selected dates.

There is plenty of scope to do further and deeper analysis on datasets like this one. One could look at conversation dynamics, network analysis of who replies to who if the conversation includes more than two people, explore sentiment of messages, identify topics, etc.

For now, if you ever decide to embark on the adventure of analyzing your iMessage data, I hope this tutorial helps you understand how to retrieve the data and deal with those messy MacTime dates.

Happy hacking!

--

--

Guillermina Sutter Schneider
Analytics Vidhya

Argentine data scientist living in Berlin. I like numbers, dataviz, and Germany. Not necessarily in that order.