Termite Part Three: Classifying Censorship, Data Deletion, and Cookie Policies

A data science solution to privacy policies that nobody actually reads

Michael Steckler
6 min readOct 2, 2020

In this post, I will detail how we quantitatively assess the public accessibility of privacy policies, as well as dive deep into the decisions we made with respect to the first three of our six privacy categories: Censorship and Suspension, Deletion and Retention, and Cookie Policies and Other Forms of Tracking, and why these policies should matter to internet users.

Categories

Dilemma

When trying to develop a product to elevate people’s privacy awareness, we had to make tough decisions. The tricky thing about privacy is that it encompasses many topics, some intersecting, others seemingly disparate.

Moreover, privacy values differ across individuals, firms, cultures, and jurisdictions. In the following section, I will illuminate how Termite covers its privacy values, the dilemmas we faced, and the tough decisions we inevitably had to make for our product. In general, the two types of conundrums we faced were related to assumptions or our technical capability.

The first conundrum stems from assumptions about our target user and their knowledge base. We had to ask ourselves: do we want to design our product for the average internet user, who we assume would have little to no knowledge about internet privacy, or for data scientists, technologists, academics, and privacy experts, or somewhere in between? What does the average internet user care about, and does that significantly differ from experts? Moreover, there is a disconnect between people’s values, and how they act upon them. How do we factor these differences into our product, to both encourage better user behavior, but also not waste our time on futile features? Relatedly, another obstacle was finding a healthy balance between overanalyzing and maintaining simplicity.

There was also the question of transparency. Typically, honesty is the best policy. However, if a website is transparent about a practice you find particularly troublesome, is it better for the website to be vague and the user to be blissfully ignorant, or uncomfortably knowledgeable. With respect to this dilemma, we gave kudos to policies that were transparent about their processes.

Decisions

To enlighten some of these other uncertainties, we devised an exploratory market research survey. Our initial survey results helped us distill privacy topics that were significant for most people. We complemented our survey by interviewing privacy experts Morgan Ames and Jared Maslin to fine tune our category parameters. In the end, we were building an MVP and were bound by realities, and had to narrow the scope and focus of our product to categories that were simultaneously useful to the end user, as well as feasible for us to address. Below, we reveal the six categories we decided to cover, as well as our approach to addressing public accessibility. Both conceptually and technically, we separated public accessibility from the rest of our six privacy categories. This is because public accessibility has more to do with the length and types of words used in the document, as opposed to the actual content of what the policy language is saying and meaning. It is therefore not included as a weight in our primary scoring mechanism, and is instead graded separately.

0. Public Accessibility

For the most part, people don’t read their online contracts and policies because they are \ not reader-friendly. As such, we wanted our tool to gage how readable a web host’s privacy policy is. There are various ways to measure document readability. To measure public accessibility of a privacy policy, we chose to use the Flesch Reading Ease score, which the U.S. Department of Defense uses as its standard test of readability for its documents and forms. Florida requires that life insurance policies have a Flesch Reading Ease score of 45 or greater, and the average adult can read documents that have scores between 70–80. Interestingly, the policy of my graduate degree program’s learning management system received a public accessibility (Flesch Reading Ease) score of 31.9299.

Image by author

1. Censorship and Suspension

When it comes to control and ownership of content, data, and services, we had to make the distinction between the rights that firms have versus those that the end user has.

When we first analyzed censorship, our minds concluded that “all censorship is wrong,” thinking only of authoritarian regimes. However, this US-centric bias is limiting, and too simplistic, painting the topic of censorship in black and white terms.

Indeed, authoritarian regimes have abused technology to violate rights and institutions like freedom of thought and information. However, at the other end of the political spectrum are democratic nations battling uphill against an unrivaled, unregulated monstrous beast, diseased with misinformation, disinformation, and so called “fake news.” The current reality of our social media platforms is that they breed, incubate, and inculcate hate speech, extremism and terrorism.

In reality, censorship can be used for the greater good. Forums such as Facebook are in dire need of better quality control, content moderation, and accountability mechanisms. Left unfettered, hate speech, extremism, and intolerance can continue to spread like a wildfire. In the future, we hope to see the prominence of fact checker labels, similar to that of Twitter, as well as labels for content posted or shared by bots.

With this in mind, we evaluated censorship and suspension policies as follows:

Image by author

2. Deletion and Retention

Deletion refers to the right and ability of the user to request that their data be deleted, and that it actually is. The italicized clause of the previous sentence is extremely important. Some policies deliberately say they store some of your data despite requests to have all of it deleted.

Retention actually refers to an action conducted by the firm, and has to do with how long they hold onto your data. Some services exercise unlimited retention, while others limit their retention by deleting logs of your data on a regular basis, such as every 90 days.

We evaluate data deletion and retention policies as follows:

Image by author

3. Cookie Policies and Other Forms of Tracking

Cookies are small pieces of data sent from a website and stored on the user’s computer by the user’s web browser while the user is browsing. Cookie policies govern activities such as tracking, collecting, storing, sharing, selling and otherwise optimizing a user’s experience, including targeted advertising. Seemingly delicious, cookies can be used to eat away at your privacy. Websites can also employ web beacons (also called page tags, pixels, pixel tags, and clear GIFs) to track user behavior, check that a user has accessed some content, and inform web analytics. A web beacon is a standardized set of protocols designed to allow web developers to track the activity of users without slowing down website response times. So… what’s the difference? Beacons are slightly less invasive than cookies. For the end user, they’re really not any better or worse than cookies. Whatever is gained in lack of invasion, is lost in public understanding.

These tracking technologies are often used to customize your user experience, as well as to provide users with targeted advertising. It is worth noting that some people actually prefer to have targeted ads, if they’re going to have to see ads at all. Moreover, many wonderful features from services would actually not be possible without tracking technologies. Further, there are some services that cannot function at all without using some form of tracking technology.

For our coding, categorization, and scoring purposes, this proved to be a serious dilemma. We at Termite believe that designing products and services with privacy first means that your product or service should work without reliance on cookies and other tracking technologies. In many cases, when a service does not give you the choice to not be tracked, they’re essentially not doing enough to support and respect your privacy values. As such, we had to make tough decisions to favor services whose policies stipulated that they don’t track you, minimize their tracking, or facilitate the choice to disable tracking. over ones that do, even if it truly is to optimize their service.

Thus, we evaluated cookie and tracking policies as follows:

Image by author

Termite Part Four: Assessing Data Collection, Information Sharing, and Changes to Policies” covers the remaining three of our six privacy categories: Data Collection and Usage, Information Sharing and Selling, and Policy Changes.

--

--

Michael Steckler

Data Scientist, Tech Policy Consultant, and Educator. My views do not reflect the views of any organization I am affiliated with.