Traces of Data and Dogs

Photo by James Barker on Unsplash

In today’s world, everything is connected to everything- from our receipts at the grocery store to the photos we post on social media. Most of the time, this data is collected without our knowledge. Sometimes it’s used for advertisements and marketing to curate your recommended shopping list and other times it’s used as research, both serious and for fun. For example, scraping Tweets can help researchers understand what’s popular and trending in society and align it with other events occurring elsewhere. But sometimes, it’s just used for fun, such as knowing where your cat lives.

People love patterns. We like making them, designing them, shifting with them, learning about them and sometimes changing them. The internet does this frequently with all of the data it has. To show an example, I’ve decided to look into whether or not “dog” people go outside more than other people.

A non-technology based hypothesis might tell you that the answer to this question is yes. Dogs have to be walked and let outside to go to the bathroom, therefore, one could infer that these people are more willing to go outside because of this, and thus be active and outdoors-y. They enjoy this time spent outdoors with or without their pup and therefore enjoy time outside in general.

To look at what the internet might have to say, we first begin by understanding what data is generated about a person over the span of a week. To answer this question, we can use multiple pieces of information including, but not limited to:

  • Social Media, photos & geolocation tags
  • Google Maps searches
  • Google Chrome (or any browser) search
  • Netflix and TV logins and traces
  • Clicks on an outdoor site
  • Pet microchips

All of these pieces of information show pieces of our lives and can help us answer a bit more about what we are like. Let’s dive into the different pieces of information we have on ourselves.

Social Media- (Dog) photos & location:

Photo by Dayne Topkin on Unsplash

Typically, people love to upload photos of their pet onto Instagram and Facebook, tagging them in all of their adventures. It’s now to a point where people have their own accounts just for their dog. With all of that comes more data! Instagram collects captured content, such as photos and videos along with data that links users to those photos and geolocational data. The main motive behind this cache of data is to help shape your feed and curate your ads. The geolocation comes in longitude and latitude as well as a street address and name.

“location”: { “latitude”: 37.778720183610183, “longitude”: -122.3962783813477, “id”: “520640”, “street_address”: “”, “name”: “Le Truc” }

Sources: https://www.identityguard.com/news-insights/need-know-instagrams-privacy-policy/

https://www.quora.com/Do-Instagram-photos-include-location-data

Google Maps searches:

Photo by henry perks on Unsplash

Google Maps is one of the greatest inventions of our time. We rarely have to hold open a paper map on the road and the days of ending up on the wrong side of town are (mostly) gone! Not much surprise on this one, but Google Maps tracks your location. Despite helping determine where you are on the actual road, it does have a few other benefits. Your phone sends data anonymously to Google that lets them know how quickly cars are moving and thus can determine traffic. When there are multiple phones using the app, the traffic predictions become more reliable as Maps can look at the average speed of cars.

Source: https://www.quora.com/How-does-Google-Map-work-and-gather-data

Google Chrome search history:

Photo by Benjamin Dada on Unsplash

Google is a powerful search engine that is great at catering your searches to help you figure out what you are looking for. Short texts records, called logs are created by key applications to process responses when users request things. Typically, the logs that are accumulated are simple, noting the date and time at which a specific URL is requested. This information is used to show the popularity of a site or when they are least and most busy. Algorithms are also written to determine which sites are the most helpful to users.

Source: https://www.google.com/search/howsearchworks/algorithms/

Netflix & TV logins:

Photo by Charles 🇵🇭 on Unsplash

Netflix tracks your viewing history as well as the dates and times the account was accessed from any profile. With that comes the IP address, locations, and the type of device used. You also get a chance to see what devices are logged in and can boot someone off if they’ve hit the streaming limit.

Source: https://qz.com/632779/how-to-pinpoint-precisely-where-and-when-someone-used-your-netflix-account/

Tracking clicks from sites:

Photo by Fancycrave on Unsplash

As many people know, many sites know are able to track clicks to help determine where their user base is located, who their users are and other details such as what type of device they’re using. All of this can be done with a few simple lines of code via Google Analytics. This site can give you a breakdown of your user demographics down to the creepy bits, such as what else they’re interested in and why they clicked on your page.

Pet Microchips:

Photo by: American Kennel Club on akc.org

Pet microchips are designed for helping you keep track of your pet- but many of them don’t have a GPS system. They work when they’re scanned at a vet’s office or pet store, where they can obtain information about you and your pet so you can be reunited.

Source: Pet Chip Registry: https://www.rfid-usa.org/

With all of this data, what can we do with it?

The Google Maps data would allow us to see where people are going and what they’re doing while they’re there. For example, we may see that a person might go on a lot of hikes and also frequently visits a (dog) park. Meanwhile, the search history could be useful in understanding what type of interests a user has and whether they relate to being outdoors or pets. Similarly, tracking clicks could show if a user goes to a lot of outdoor-based websites, including clothing stores, NPS, or any parks & recreation site. The microchip data could be joined to a larger dataset to determine what type of pet people have and see what activities they do. As a cross-reference, we could take our Netflix data to see who uses Netflix the most (profile IDs) and where they’re located (using location and IP address) and if that shows a correlation with other online behavior.

Let’s take the pet microchips data and the Google Maps data.

After loading in your datasets, you’ll want to see how many rows and columns you are working with.

From there, you can find common variables to do a join on.

You can join data on inner, outer, right or left functions. This will increase or decrease your rows and columns, so it’s important to look at your options before deciding on one singular join.

For this dataset, it looks like an outer join on “State” is the best one.

Now, we can begin to see what types of inferences we can make. Because most of this data is non-numerical, it will be a bit more difficult to plot anything, but hopefully, we can still find some neat patterns.

We can plot out the longitude and latitude to understand where most of our users with pets live.

We seem to have more pet people living on the coasts than in the midwest.

Next, we can look at Zip Code as a way to determine where someone is or has been. We can clean up the column before putting it into a plot.

Another user who did a similar study, but only for Seattle pet licenses was able to come out with this graph:

In this second plot, the zip codes with higher counts are also the more wealthy areas of Seattle- which may make more sense as to who owns dogs.

While this data isn’t the greatest, by using just a bit more reference, patterns can be found through more analysis. Other questions you could ask is if there’s a shift between higher and lower income areas and the pets owned, or if a certain state owns more pets than another.

After all of this, you may think that the internet is a creepy place. And in some ways, it definitely is a weird place. If you’re feeling watched, don’t fret! There are now tons of ways to prevent this data includes the following:

  • Turn off your location on your devices
  • Stop accepting all cookies on your devices
  • Use Duck-Duck-Go rather than Google. They don’t track your data

Always remember that once something is on the internet, it’s very difficult to take off!

Resources:

Pet data from: https://catalog.data.gov/dataset/lost-found-adoptable-pets-19b8e/resource/f161442a-aac3-4110-989d-d6a2fd04206f

Map data was a mockup using https://mockaroo.com/

--

--