Gender and Oppression Online: A Discussion about Marquette’s Tweeting Habits

Credit: Christina Lu

TL;DR — see the live project here

In recent years, there’s been a staggering uptake in digital media consumption. From 2005 to 2016, average American social media use skyrocketed from just 5% to 69%, a near fourteen-fold increase. By the end of 2017, there are expected to be more than 2.51 billion social media users worldwide, and with 1/3 of our total waking hours now devoted to our cell phones, ignoring our digital selves is increasingly becoming self-neglect.

Culturally, we’ve come to value our online identity as an important social and legal extension of our own person-hood. As early as 1991, courts began to recognize the unwanted distribution of nude photos as a form of sexual harassment and sexual violence. Earlier this month, a Dutch man was sentenced to 11 years in prison for cyberbullying young women, and awaits a separate trial in Canada following Amanda Todd’s suicide. And as Jo Fuertes-Knight commented in a 2014 Vice documentary,

“the rise of social media…and a blurring of our real and online selves has transformed how we interact…as we spend less time with loved ones and more time staring at screens, it’s the definition of intimacy that’s become even more ambiguous”

Our cyber identities are increasingly tied to our physical, emotional, and existential selves. Our social technologies shape “who” we spend our time with, and more importantly, how. Who and what we choose to be online reflects a great deal about our own identity, particularly with respect to gender.

And I want to explore more of how online comments and interactions between gender identities take shape.

More specifically, I want to look at online conversations on Marquette University’s campus to see if patterns of behavior diverge based on gender identities. Since so much information and data is widely distributed and available online, can this so-called gender identity be measured and analyzed?

To tackle this question, I’ve been looking at tweets on Marquette University’s campus over the past month. I used Twitter’s API and some Open Source Software libraries to scrape tweets from Twitter, analyze their sentiment (a measure of how positive or negative a sentence/word is), and parse topics from sentences. I built a web application to aggregate, analyze, and visualize this data using a remote database (you can see more of the technical bits/source code here).

The Project

To explain this project, I think it’s important to understand how I group data together, how I measure sentiment, and how I distinguish “male” and “female” users.


To collect tweets on campus, I have a server listening to a service provided by Twitter that allows me to collect tweets bounded by coordinates. I restricted the tweets to the area below:

Code snippet of Marquette’s “campus”
Area of tweets I listen for

Sentiment and Gender Analysis

To catalogue gender, sentiment, and topic analyses of tweets, I use two software libraries: compromise and sentiment. Rather than explain the whole process, let’s look at some code!

Let’s say we have a tweet with the following text:

Beware of false knowledge; it is more dangerous than ignorance. — George Bernard Shaw

To figure out the sentiment of the text, as well as any topics mentioned, all I need is about 5 lines of JavaScript:

The first two lines import the software libraries I want to use, and the last two lines use these libraries to store some analysis of the text to two variables: textSentiment and topics.

The text sentiment variable will hold the following details after running the above code:

Each positive or negative word will be assigned an integer from -5 to 5, inclusive. The overall sentiment of the tweet is the aggregate of all of these positive or negative words. So for the above tweet, the total sentiment of the words “ignorance”, “dangerous”, and “false” is -5.

The topics variable will look like:

The software libraries I’m using are only smart enough to decipher predictably “male” or “female” names, so I actually ended up requiring a gender binary system to filter and analyze tweets. I simply don’t have the expertise or resources to be able to analyze other complex topics like race, trans/intersex identities, class, or religious affiliations.

As such, my analysis will exclude most social and contextual factors.

But I think there’s value to analyzing the data even in this narrow way, as a gender binary system frames this discussion in a way not too different from the cultural standard we have today — the assumption of a gender binary pervades our thoughts, beliefs, and discussions. Recognizing if and how this model of gender works in the digital world helps us to better understand how gender functions in our culture overall, especially as we continue to invest more of our time in this digital space.

Since I’m restricting my search to Marquette’s campus and Twitter’s platform, I don’t expect to be analyzing tens of thousands of data points in just a month’s time, so I don’t want to narrow my search so far as to restrict the size of the available data.

And because the ironically-named software library I’m using, compromise, can only guess two genders, I’ve been analyzing tweets with users that have predictably “male” or “female” names.

Exploring the Data

My first goal is to get an idea of what the data looks like. After aggregating tweets over a month, I plotted the timestamp and sentiment of every tweet I collected (~9,000 tweets). I separated the data into two series: a blue series to signify the sentiment of male users, and a pink series to signify the sentiment of female users:

I decided to use unambiguous and stereotypical gender colors in the design aesthetic of the application (I wanted to really emphasize that this analysis isn’t inter-sectional at all: I assume only two genders, and I don’t analyze race, class, economic, religious, political, or even relational contexts beyond predictably “male” or “female” users).

I think there’s already quite an interesting shape to this data. The overall sentiment of users looks oscillatory, almost sinusoidal day-to-day.

I think that’s kinda cool.

Female tweets also appear to be more banded; the tails and heads of the bands in the graph above are mostly blue, and maybe that says something about the consistency of female sentiment on Twitter?

I dunno.

So let’s look at another picture. What if I were to ask, on average, what does sentiment look like for a user during the day? On 8PM on a Sunday, for example, what’s the average sentiment of male- and female-authored tweets? To answer that, I split the data by gender, then by hour. I averaged the sentiment of every tweet, then graphed the results.

Here, the daily sentiment graph might further suggest a female user’s sentiment tends to exist in a tighter range; that is, sentimental variance appears smaller for women. While that might be a fun little tidbit (and goes against the cultural stereotype of women as more emotional), it doesn’t offer a whole lot of insight into the treatment of gender identities online.

And maybe you’re thinking, twitter users aren’t all equal. Some users are on twitter more often than others, and some have more followers than others. How should we account for that?

A user with 5 million followers probably influences digital and social conversations more than a user with 5,000 followers. So if we weight a tweet by the user’s follower count (i.e. the more followers a user has, the more significant the sentiment of the tweet becomes), what does the average sentiment of tweets look like? Does this change our understanding of the picture above?

Let’s take a look.

The data is exactly the same as above, except the sentiment of each tweet is multiplied the number of followers a user has.

Here, sentiment looks a little different than above. It looks to me like female users are more positive than male users when taking into consideration their social clout. Female users’ average sentiment also oscillates considerably less than male users’ average sentiment.

But this is the average. What about totals? Are there equal representations of male and female users from Marquette’s campus on Twitter? If I were to ask how many times female and male users tweet during the day, what number would I get?

Oh. Well that’s not exactly an equal representation.

It looks like only at midnight are men and women tweeting in equal volumes. At every other hour of the day, men “out-tweet” their female counterparts by a ratio of almost 2:1.

That’s staggering to me.

Roughly 70% of all tweets on campus are by male users? The number could be a little high, but that’s a substantial difference. And if it’s true, then there are over twice as many tweets by male users than female users.

Moreover, even if we were to assume that male and female users were equally sexist on Twitter, women would be on the receiving end of sexism much more often than men; the idea behind this is the Petrie Multiplier, a mathematical idea illustrating how negative relations in an unequal group “multiplies up to the detriment of the minority group”. In fact, by this effect, female users from Marquette’s campus on Twitter are roughly four times as likely as male users to receive sexist remarks online, even if we assume both groups are equally sexist.

To get more granular results, I filtered the data further by sentence topic. For those tweets that mentioned a person with a guessable gender in the text, I averaged the sentiment and counted the positive and negative words used in the tweet. And this is where it got quite interesting…

Male and female users refer to male subjects with roughly the same average sentiment (.7 and .8, respectively). Female users, however, refer to female subjects more positively (1.5 vs. 1).

Since female users refer to male subjects more negatively, perhaps this can explain why there are more male users online.

But more importantly, how exactly do users talk about gender identities? To get a better sense of this, I totaled the usage of positive and negative words used when users mention a male or female subject. I then split the data by male and female users, and visualized the results with some cute little UI buttons 🤓.

When Male users referred to male subjects, the top negative words used were:

Male users referencing male subjects

When Male users referred to female subjects, the top negative words used were:

Male users referencing female subjects

When female users referred to male subjects, the top negative words used were:

Female users referencing male subjects

Finally, when female users referred to female subjects, the top negative words used were:

Female users referencing female subjects

Drawing Conclusions on Oppression and Gender

Let’s go back to the previous question:

What do our online interactions say about the attitudes we hold of gender?

The data would suggest there’s a noticeable difference in the representation of male and female users, as well as attitudes held about gender. Male users refer to female subjects less positively than female users, and male users tweet at over twice the volume as female users.

Not only is it mathematically arguable that female users on Marquette’s campus face more sexism online than their male counterparts, but it might also be possible to see many of those interactions as perpetuating violence, marginalization, and cultural imperialism, as understood by Iris Young’s discussions on oppression.

In the pictures above, many of the negative words used by male users (like “cut”, “shoot”, “killing”, “fire”, and “dead”) in the context of female subjects are extremely violent. And even though I’m not able to programmatically contextualize these interactions (i.e. doing some more robust natural language processing), the language used by both male and female users looks quite different. Some of the negative words used by women to describe male subjects include “stop”, “stopped”, “delay”, “nasty”, and “wrong”. Doesn’t that seem like responsive language?

Not only are more male users using this digital space than their female counterparts, but roughly 85% of subjects mentioned in tweets are male. If fewer women are represented in this digital space, and if even fewer women are discussed online, then women are simply not as big a part of the social conversation online. Thus, they are also less able to contribute to this social dialogue. It’s an issue of representation and participation, and I think it’s so unbalanced because those most at liberty to fundamentally change how this technology works are largely white guys in hoodies.

Looking Forward

credit: f1x-2

The final and perhaps most important takeaway from this project is understanding just how much control software engineers have in framing social interactions, and how big a problem the lack of diversity in this industry is.

As of 2016, roughly 69% of employees at Google are male; at Apple, it’s 68%, and at Facebook, that number is sitting at around 67%. Black and latinx representation at many of these companies is even less than half the average of what we see in congress.

The social technologies that these companies build influence and shape how we interact. To participate on Facebook, Twitter, Instagram, or Snapchat requires an account, and an account requires user data. The data these companies look to collect as prerequisite for membership in these online spaces also frame how and what users’ profiles are presented. And most collect inordinate amounts of data that can be sold at a high price to advertisers, or used internally in clandestine ways.

Facebook, for example, has a pretty awful track record of using its platform for…well…evil. In a 2014 “study”, Facebook modified the content of ~700,000 users’ news feeds, and observed how easily the platform could manipulate users to “experience the same emotions without their awareness.” Earlier this month, leaked documents from Facebook’s Australia division revealed executives nonchalantly bragging about the company’s ability to use internally-designed algorithms to target emotionally vulnerable teens. And if that wasn’t enough, Facebook now allows advertisers to target racial groups with a cryptic “ethnic affinity” filter.

I guess the point I’m trying to get at can be summed up succinctly by Amanda Hess, an American journalist and staff writer at Slate magazine:

“…as the Internet becomes increasingly central to the human experience, the ability of women to live and work freely online will be shaped, and too often limited by the technology companies that host these threats, the constellation of local and federal law enforcement officers who investigate them, and the popular commentators who dismiss them — all arenas that remain dominated by men, many of whom have little personal understanding of what women face online every day.”