What’s Almost Left Unsaid: An Analysis of Harvard Confessions
Co-authored by Jenny Gu, Melissa Kwan, Sahana Srinivasan, and Yijiang Zhao.
The Facebook page Harvard Confessions allows students to submit notes to crushes, rants about school, and general controversial opinions under the veil of anonymity. Started March of this year, it has amassed over 1,500 likes on Facebook. On the whole, what do Harvard students feel the need to confess to the Internet? HODP scraped the text, dates, and likes of the first 365 confessions, posted over the course of two months up through mid-April, to find out.
The most popular topics of conversation were friends, love lives, and Harvard, as seen below in a word cloud that compiles the most popular words used in all posts, excluding basic articles and common words . The word cloud also represents the fact that multi-post threads focused on singular, niche topics sometimes dominate — the Asian-American Association, for example, makes the list (seen between the ‘g’ and ‘i’ of ‘girl’).
We correlated number of likes with the presence of specific words to see if certain topics made posts more popular. Correlation coefficients were all low enough, below 0.3, not to be definitively meaningful: the correlation coefficient of the relationship between post length and number of likes was 0.29, between likes and explicit mention of Harvard it was 0.215, and between presence of expletives and mentions of Harvard it was 0.24. 22 percent of posts mentioned Harvard explicitly.
WE’RE SAD, BUT NOT THAT SAD
To students browsing through the Confessions page, it may seem that negative, angsty posts dominate the conversation. Those students are right. To find just out how sad and negative we really are, we ran sentiment analysis on the confessions to categorize each one as positive, negative, or neutral. The sentiment analysis API we used returned three values: the positive-negative split, the probability of being neutral, and the overall classification. If the probability of being neutral was greater than 0.5, it classified the excerpt as overall neutral; otherwise, it went with whichever categorization prevailed in the positive-negative breakdown.
Here are some examples of how posts were categorized:
So how did we fare overall? We subtracted 0.5 from the results to change the sentiment range from [0, 1] to [-0.5, 0.5], resetting a neutral sentiment to 0. After aggregating the results, we found the mean sentiment of the posts we analyzed was -0.129 and variance was 0.026, meaning that the majority of posts fell in the neutral-moderate range. Note from the right skew that the number of extremely negative posts (< -0.3) far outweighed the number of extreme positive posts (> 0.3 or above).
A RISE IN POPULARITY
As could be expected, the responses to a post (the aggregate of views, likes, and comments) increased over time as the page amassed more likes. The break in data in mid-March corresponds to spring break. (The current, active page has 1,641 likes as of early June; for comparison, the well-established MIT Confessions Facebook page has about 33,000.)
SPRING BREAK BLUES
Based on the graphs above, we analyzed the mood of the posts across time. Interestingly (although not surprisingly), the number of negative posts drastically increased immediately after school returned to session after spring break (which ran from March 16 to March 24). Note when looking at the line graph that no posts were made over spring break. Admittedly, the overall number of posts after spring break, due to the hiatus in posting, spiked as well, but the number of positive posts actually dropped right after break. The Confessions page appears to reflect the overall mood of the student body and the elevated levels of stress from coming back on campus.
DETAILED SENTIMENT ANALYSIS
What confession topics are the most common? And which are the most popular? To find this out, we manually categorized each post as negative, positive, or neutral and by topic. Compilation posts were treated as one, since they tend to have a common theme, which may skew the representation of positive posts due to “compliment compilations” being an oft-recurring type of post. Negative posts received a marginally greater average number of likes than did positive posts, with neutral posts being noticeably less popular. It seems we tend to more actively show approval of more polarized posts.
The seven categories of confession topics we used were love lives, campus, general life, compliments, school, replies to other posts, and friends. Posts on love, dating, and hookups comprised a plurality of the page, followed by campus and then general life.
Below is a breakdown of the mood of posts within each category: posts about campus life and Harvard are mostly negative and had a considerably higher proportion of negative posts than any other category, even love. General life posts were more often positive than posts on other topics. Post about love lives were also overwhelmingly negative or neutral.
We also broke down the average number of likes within each category of post. Campus- and Harvard-related posts received on average the most likes, perhaps because they are the most universally relatable, especially when compared to personal life confessions or individual shoutouts.
So what do Harvard students need to confess anonymously on the Internet? Mostly lamentations on Harvard; love, hookups, and dating; and life in general. As someone just scrolling through the page might surmise, our confessions lean negative. It makes sense that it might be easier to tell to a Google Form things that are hard to admit or complain about in person, and it makes sense that the page has been getting more popular as more people read, relate to, and want to submit their own confessions. And though they’re dominated by the negative and neutral posts, the positive ones — mostly compliments or shout-outs to specific friends or crushes —still get likes and do exist.