Show me what you Google, I tell you who you are
Analyzing my own Google Search History from 2009–2016
About a year ago, I largely stopped using Google, replacing it with duckduckgo, a search engine that doesn’t track and as a nice bonus, always shows you the first answer on StackOverflow without navigating to SO.
Before I quit Google and many of its services however, I got all my data from their site using their takeout service. Let’s not get hung up on if you should or shouldn’t use Google services, I just decided I’d move to some other services for now.
Data Importing and Cleaning
Yesterday, I found some time and started playing around with Python to analyze my search data. Each file you receive is a JSON for a number of search queries. I had about 80k queries spread over 30 files. Let’s look at how they are structured
So, to import them into Python, my new language of choice, you can do this:
Now, you have a nice list of searches with tuples of (‘query’, ‘timestamp’)
Search Times Analysis
Let’s look at the development of my search frequency.
The result is a bar graph, similar to those that show the trading volume in stock graphs. As you can see, from 2009–2016, I obviously searched more and more. However, around 2/3rds through, there was a big dip. I don’t quiet remember why, but I am curious to find out what was the reason for that.
Day of Week Analysis
Let’s see how much I Googled throughout the week in relation to the different years.
The result is quiet amusing. If my googling rate correlated to my curiosity about the world, I sure seem to loose interest in it throughout the week. I didn’t know I was a Monday person, but apparently I am!
Time of Day
To generate this for a time of the day, follow the same procedure as for the weekdays but instead of placing the data in a 2D array with year/dow we place it in a year/tod array. The result is again interesting:
Obviously, I search more during the day. But what is going on in 2016? I must have really been hitting the keyboard at night, as all other data showed me stopping to use Google in mid 2016, leading to a about 50% reduction compared to 2015. But here, I am going strong at night. The reason is simple: I lived in Australia for 4 months but Google didn’t apply my actual time of the day. Rather they apply either my “home address” time or use some other form of fixing me to GMT+1.
Part 2: Terms
Looking at the terms I searched for, there’s some clear winners: Addresses, weather and exchange rates + technology. There is also some other things like my previous bank institution (their website was rather hard to navigate) and the usuals.
(‘euro dollar’, 210)
(‘Erinnerung hinzufügen’, 43)
(‘Im Langen Bruch, Köln’, 40)
(‘ing diba’, 36)
(‘stargate universe’, 35)
(‘jquery mobile’, 34)
(‘opitz consulting’, 33)
(‘(Aktueller Standort) -> Im Langen Bruch, Köln’, 32)
(‘add reminder’, 30)
(‘uni köln’, 29)
(‘(Current Location) -> Im Langen Bruch, Köln’, 28)
(‘weather cologne’, 26)
(‘pascal brokmeier’, 25)
(‘adf mobile’, 25)
(‘xda transformer’, 24)
But this is not as good as you want it. It’s more interesting to look at single words, not full terms. So, if we split them up by word and then look at the frequency, it’s more telling
Again, locations, streetnames, and some trash. Why did I say it’s telling? because of the nice “wordcloud” tool. I know I know. Using a WordCloud is highly nothing special and it has nothing to do with science. But it’s pretty and I was tired at 11.30 and wanted to hit the bed.
There you have it. If you ever wondered how websites track your interests (I hope you haven’t wondered about that anymore for the last 3 years), it’s pretty clear. Even if I wasn’t logged into Google, A combination of about 10 searches + my location would probably be enough for them to reattach my unique ID. The terms above are averaged over 6 years. Of course you can get a much higher grade of detail about a person if you get daily updates. What are you looking to eat, researching a new gym, laptop, place to live, flight prices, …
If you want to do the same for yourself, get yourself a python environment and use this file to get started. It includes all the code from above. Just place it in the same folder as your
And if you liked reading this, consider diving into my other latest post about reinforcement learning