A Privacy Primer: Part 1 — Theory

WARNING: Lots of text ahead! This is the part covering conceptual stuff, so it’s all reading. You have been warned.

Over the last couple of weeks I’ve had a few friends ask me about steps they can take to improve their privacy, specifically on social media. Considering the events going on all around us, I think it’s finally become apparent to lots of people that maintaining privacy isn’t about hiding things because they’re illegal or because they’re wrong. Privacy is important because information is powerful, and someone with enough information about you can use that information against you.

I’ve decided to split this into two parts.

  • Part 1 (this) will be an introduction to some privacy concepts and the technology behind the Web. I’ll go over the ways in which information is gathered and how that information is generally used for those who might not know the extent of this information gathering. I’ll also give some examples to help you understand why you might want to protect your information.
  • Part 2-x will be dedicated to more practical information, like changing browser settings and social media privacy settings. Later parts may also include more technical explanations of some topics that may be covered only briefly early on.

So, where to begin? What if you’re not a techno-nerd but still want to take some action to better protect your information? Thankfully, there are some easy steps anyone can take to reduce the amount of information companies and other entities collect about you.

Disclaimer: These are very simple explanations. If you would like to find more details, see the reference section at the bottom of the article. Also, Google is your friend.

The Basics

Before I get to the instructions on what settings you can change to reduce data collection, I think it would be a good idea to go over some of the basic principles behind how data is collected, what kind of data companies collect, and how that data is aggregated to learn about you and try to make predictions about you.

I will try to leave the nerd-speak to a minimum, but there are a few concepts I’ll need to go over and briefly explain. Knowing these kinds of things will allow you to have a better understanding of data in general. My goal is to improve your awareness of how revealing seemingly unrelated pieces of information can be when they’re put together so that you can might start thinking twice before entering personal data into any website or form that asks for it.

Web Surfing 101

So, there are a few things that we should understand before moving forward. The technology behind the Internet is incredibly complex, but thankfully, you don’t have to know much about all of that to understand the following terms and concepts.

First, lets give you a simple idea of what actually happens when you navigate to a website or access almost anything on the Web. The Web is built on something called the “client-server” model. This means that there are two types of entities: clients and servers, respectively. These aren’t really technical terms, so it works the way you probably think it does. A Client (that’s you) sends a request to the Server (lets say, twitter.com) asking for something, maybe your home page. The Server checks to see if it has that information and if you’re authorized to access that information (checks if you’ve logged in). If your requests meets these requirements, the Server sends back the requested information to the Client.

Now you know the boring details of how your tweets make it from Twitter’s servers to your phone! So, what’s important about this? A few things:

  • Everything you do on the web involves sending requests or posting information to systems that other people own. This might not surprise you, but it’s important to remember because it means you don’t control your data after it leaves your device.
  • Any data you enter into a website, whether it be public posts, private messages, or anything else, can be seen and analyzed by the owners of the server, and in almost all cases, they own that information.
  • A server needs to know who asked for what so they know where to respond; this means it keeps track of you (your IP address). IP addresses don’t necessarily identify you, but they can be pretty revealing nonetheless.

Everyone Loves Cookies, Right?

Remember how I mentioned that servers check whether you have access (whether you’re authorized) to view certain things like your account information or private pages? When you log in to a website with a username and password, you are identifying yourself (the username) and providing something that proves you are who you claim to be (your password). This is why you can’t view your feeds or email before you’ve logged in; this makes it so only you can see that information.

What if a server asked you to verify your username and password every time you tried to change to a different private page? That would get really annoying pretty quickly, right? The way servers typically avoid having to do this is using something called “cookies”.

Servers generate something called a session ID whenever a client connects to them. This is an easy way to track sessions of communication between specific clients and the server. When you log in to a website, it sets a session ID for you that has been granted access and is linked to your user account. Cookies are a way for the website to store that session ID locally on your computer, so when you navigate between different pages you just have to send that cookie along with your request to authenticate yourself.

Cookies: Great, In Moderation

Alright, so cookies sound pretty useful, right? They are, but they also have some serious privacy implications. Cookies are generally sent along with requests to the servers that set them. This means the site can keep track of every page you request, what things you search for, how long you spent looking at a page, which links you clicked, and more. This kind of information is then used for things like targeted advertising or to suggest other pages you might be interested in. This can make your experience much more enjoyable, but that information can disclose a lot about your interests, ideas, and beliefs.

Like I said, cookies are generally only sent to the websites that set them, so only they can track what you’re doing. But, here’s the thing: most websites these days have content on them that comes from a whole bunch of other places. Things like images and advertisements on web pages are usually loaded from other servers.

A good example to use is ads. Let’s say you logged into your Amazon account to order a new case for your phone when you first got on your computer. Amazon sets a cookie on your machine so you can freely navigate their pages and make your purchase, and when you’re done you move along and surf to a few other pages.

That cookie Amazon set? It’s still on your system, and if you make any other requests to one of Amazon’s servers, it’ll get sent along and added to the log of your session. Ever notice ads for Amazon stuff on other pages that aren’t Amazon? Whenever you load up one of those pages, your system has to make a request to Amazon, and the cookie is sent along. The request includes what page the request originated from, meaning it knows what page you were viewing that the ad was on. This lets Amazon track other pages you visit they aren’t theirs. This is why when you come back to Amazon a few hours later, it’s can make suggestions based on what you’ve been doing recently.

The Value Of Information (Nothing is Free)

Alright, so now we have some of the fundamentals on how the Web works and how web sites can track your browsing habits. But why do they do this? As you probably know, it mainly comes down to advertising. If they can show you ads based on your interests, you’re probably more likely to click on them, right? Ads are the primary source of income for most “free” services. The more successful their ads, the more money they make. Sites will give you access to their services in exchange for your information, because it’s this information that really makes them their money.

What kind of information do sites generally collect? The list is pretty long, but here are a few key data points advertisers are interested in:

  • What you search for
  • How long you spend looking at certain pages
  • What times a day you do certain things online
  • What you purchase
  • Your age and location
  • Who you know
  • How much money you make

It isn’t hard to understand how this data can be used to market things directly to you. This is only some of the information that can be collected and it’s collected over time. Most services retain the right to sell all or a part of the data they collect about you to third-parties. You agree to this when you click Agree on those long, dense legal blocks of text you have to get through to use the service. Usually, these third-parties are advertising agencies. But they don’t have to be.

Data Aggregation (Putting The Puzzle Together)

Think about everything you’ve done on Facebook since you created your account. Think about the things you’ve searched for, the people you’ve talked to, the profiles you’ve looked at of both friends and strangers, the feeds you’ve followed and the ones you’ve blocked. All of it has been recorded. If you’re in your twenties you may have had a Facebook account for almost 10 years now. Those are some pivotal years, with lots of big changes. All of that’s been recorded.

Now think about everything you’ve bought on Amazon. Think about the bank accounts you’ve linked to it. What about that Quran you bought for that class three years ago? Remember those five baby books you bought when you found out you were pregnant last year? Free Amazon Prime for students if you enter your student email address is great, but you’ve disclosed where you go to school.

Let’s leave it at just these two massively popular services. Besides the examples I just gave, there’s a whole lot more that is collected. This information isn’t poured over by analysts and people hunched over stacks of paper. It’s processed and analyzed by powerful computers using advanced statistical analysis and other super complicated sounding things.

What if I was able to get all of that information? What could I learn about you? Maybe I could learn the age of your children. Maybe I could find out about the affair you’ve been having? But what if what I thought looked like an affair was actually you traveling for business and visiting a close friend? That’s the thing: I could try to make deductions about what the data means, but there could be pieces of important information I’m missing. As powerful as these data-crunching machines are, they’re still only machines working with limited information.

Final Thoughts

Every time you hop on the Internet you generate loads and loads of information. That information is valuable to different groups and entities for different purposes.

Now you have some basic background on how that information is collected and the potential impact it can have when it’s collected long enough. Reducing the amount of data collected about us can help us take control of our data.

Up Next

In the next part of this series, I’ll go over some more practical examples and show you how to enable privacy-conscious settings on your browser, smartphone, and some popular social media sites/apps.

Further Reading

Like what you read? Give Enrique Castillo a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.