Notes on quotes

alex ioana
the peruser
Published in
13 min readOct 9, 2016

They’ve had a weird appeal to us ever since written text had become accessible to the wider population. From people you hardly know sharing obscure motivational phrases on Instagram to the Economist’ Espresso edition, quotes are thrown at us in all shapes and sizes in the hopes that they might appeal to our inner self — and so that we might click the like button.

Quotes are as much a blessing as they are a curse — in our times the fine line between art and philistinism gets shrouded by the simple fact that information is no longer something hard to attain (at least on the world wide web).

But what quotes are most popular? Is there such a thing as a recipe for success when you release a new found quote onto the world? I wanted to see what people liked to quote most, so I hand crawled the top 250 quotes on Goodreads and subsequently added multiple layers of data about their authors to see what we can all learn.

The importance of sentiments

Research shows that warm and cozy feelings have more momentum on social media. A study that took a sample of 3800 random tweets found that positive emotions are a lot more likely to spread than are negative emotions. Could this also influence the way quotes are received — by which I’m really asking, are quotes with warm and cozy feelings more popular than negative ones?

Now, Goodreads employs a tagging mechanism which allows quotes to get clustered by what their subject matter is. Given that members cumulatively like quotes more than 10 million times per year, this tagging system simplifies the whole ordeal for those looking for quotes to read.

So I used the tags associated with each quote and ran a sentiment analysis on them to see if the theory holds true and if people really do have a preference for positive feelings.

The sentiments associated with each tag were quantified from -1 to 1, with the following ranges:

  • Very negative — values between -1 and -0.5
  • Negative — values between -0.49 and -0.1
  • Neutral — values scored with 0
  • Positive — values between 0.1 and 0.49
  • Very positive — values between 0.5 and 1

What’s apparent from the initial view is that very positive quotes (as determined by the overall sentiment of their tags) score higher on average with respect to how many likes they receive. On the opposite side of the spectrum, very negative quotes receive the least likes on average. Yet highest tag usage on average shows up — maybe surprisingly — on negative quotes. This would imply that negative sentiments are more difficult to pinpoint than are the extremes on the spectrum — as if it’s easier to either express black or white, but shades between the two require more detail.

But does this all imply that people tend to like positive quotes more than negative ones?

From the plot of the data, it would seem so. There’s a weak trend by which the further you go from the most liked quotes to the least liked ones, the more negative the sentiments that are being expressed get.

Who, when, where?

I’ve found that there’s quite the variety when it comes to professions and cultural backgrounds of quote authors (for those not already familiar with this, I’ve decided a while back to replace notions akin to “nationality” with “cultural background” when speaking of what place most authors belong to). So while crawling Goodreads for quote data, I also hand checked information about all the authors mentioned in the top 250 — and came up with some interesting insight.

Because some variables came up only on single instances, I’ve decided to not take them into account per se but highlight them as outliers (you can see them colored in red).

Dead or alive?

First off, I wanted to see if living people were more popular than the deceased. I thought that while the classics might rank high up with respect to likes, contemporary authors might well have a fighting chance to outrank them since the internet levels out the playing field, in effect making a snappy one-liner from today as popular as a quote from a Renaissance figure.

Or does it? The top 250 quotes on Goodreads come from deceased authors about 64% of the time — and this might account for them scoring highest with respect to average likes received. After all, there are more chances of them scoring higher since they’re the majority.

Living authors come in at roughly 34% and trail behind on likes (22K for living people, compared to 31K on average for the deceased). While both dead and alive quote authors are in the positive range of emotions, it’s clear to see that the deceased in the top 250 are — on average — more cheerful than the living. They score a positive with 0.410 on the sentiment scale — and this is probably why people like them more.

You’ll see that I’ve also included some other categories than the standard dead/living duality. That’s because simply put, some quotes are attributed to people whose lives aren’t that well documented (so it’s ultimately uncertain if they’re living or not) or aren’t attributed to people at all (hence the ‘inapplicable’ category shown in the above graph as an outlier; it’s really only there for context — it’s a single quote which is attributed to an institution, so you can’t really call it out as being alive or dead).

So the dead tend to be more jolly when it comes to their quotes. Knowing this, I wanted to see if there was any specific time in history that stood out.

The top 250 quotes come from as far back as, at least, 322 BC. While this is ultimately a simplification, the data points out that the 17th century was the most positive when it comes to sentiment scores on quotes. A whopping 0,813 makes for a pretty positive picture, and as such it really does stand out from the crowd. Yet strangely most likes were scored by the 20th century — 33K compared to the 26K given to the 17th century.

Could this be that the link between popularity and positivity doesn’t stand? Not likely. Looking into the data, I saw that while the 17th century isn’t an outlier, it does represent only 1.6% of the top 250 — whereas the 20th century bloated up to 49%. Since the overall volume of the former is so small, there’s no tendency for its sentiments to regress to the mean — as is in the case of the latter.

Does sex count?

69% of the top 250 quotes come from men, with 30% of them from women. The remaining gaps were filled by quotes attributed to authors whose sex is uncertain (i.e. they’re anonymous) and by the outlier which is the quote I previously mentioned is attributed to an institution.

In part due to their dominating with respect to the number of times they get quoted, men outranked women in how many likes they received on average. Men also receive more tags on average than women (3 vs 2) — and have a higher sentiment score (0.413 vs 0.304). But I’ll come back to this in a minute.

The cultural backgrounds of popular quote authors

By far the most authors in the top 250 come from the United States of America. In second place come people from England, at 23% — half of the American’s 56%.

And yet when looking into things like sentiment scores, like counts and tag usage, neither of these cultures seem to stand out. The happiest of cultures which are not considered outliers is Jamaican (and we can thank Bob Marley for that) with the most overall likes coming in from India (Mahatma Gandhi proves to be quite popular).

In total, I’ve identified 22 cultural backgrounds for the authors of the top 250 quotes, with 10 of them being outliers. This lack of diversity is in part due to the majority language of Goodreads being English, and yet we can see that only from looking at this small sample there are a lot of cultural backgrounds that give out well-received quotes.

Working life and getting quoted

I’ve done my best to simplify the main activities these people have (or had) so as to get some insight into their working lives, and came up with 15 professions worth looking into and a further 8 outliers for context. Unsurprisingly, the majority of quote authors happen to be also known as authors, with 62% of them being writers.

But on average, it seems that most liked authors are polymaths and the least liked are internet personalities. I guess this too should come as no surprise — polymaths (which is a sort of anachronistic take on a person’s profession, I know, but I’m talking about ancient Greeks and Romans) lived millennia ago — and have stood the test of time because of the quality or importance of what they said.

When it comes to feelings, business people tend to be the saddest of the bunch, and they top everything else when it comes to tag usage with a huge number of 10 on average, indicating that their subject matter is hard to define emotionally. On the opposite side of the scale, artistic performers seem to be the cheeriest. So do positive feelings really count for more popularity? It’s hard to give a simple answer to this.

While it’s true that positive feelings gain more momentum, it’s just as true that in some situations quotes evocative of negative feelings might become popular. As Nietzsche put it, what doesn’t kill you only makes you stronger — and we all enjoy the odd splash of negativity from time to time. It’s all about balancing out the bad with the good, but just by enough so that your message doesn’t become a rainbow of meaningless fluff. Speaking of which…

A look at the rankings

“Don’t cry because it’s over, smile because it happened.” Sound familiar? If yes, then you shouldn’t be surprised to hear that what you just read tops the list with respect to the number of likes on Goodreads. With over 30.000 likes more than what’s on second place, the above quote from Dr. Seuss looks set to reign supreme for some time.

This is where the real contrasts come about. The second most popular quote of all time comes from Marilyn Monroe, then Oscar Wilde, Einstein, and we even find another quote by Dr. Seuss in the top 10. All this begs the question, who’s the greatest quote author of all time? And it so happens that there are a few ways of answering that question.

One of them is to look at frequencies.

If you look at the number of times somebody gets in the top 250, then the winner would be J.K. Rowling. This goes to show that when speaking of the best quotes and not the entire sample, women might well be more popular than men.

Another way of looking at this is to tally up how many likes each author has.

With respect to this, we see a change in the rankings. Oscar Wilde’s quotes, which have themselves stood the test of time, have clocked in the most likes of all authors in the top 250. Then comes Dr. Seuss, and only then do we see J.K. Rowling. So while some people might get quoted a lot more (ergo be more popular) it’s the ones that stand the test of time that eventually rank highest with respect to likes — most probably due to their universal appeal to multiple reader segments.

So who’s liking all of these quotes?

It’s hard to get an accurate picture, but in September 2016 there were around 47 million visits to Goodreads. Most of them were from women aged 18 to 24, with a high chance of them having grad school education. And while it’s a push to call Goodreads an entertainment website, a Nielsen study from 2013 did find that women were more inclined to access websites like Goodreads.

What’s more, Goodreads themselves stated that women are more active on their platform than men are: women read twice as many books and as such are inclined to spend more time on the website.

But here’s where things get interesting. According to Goodreads, when it comes to authorship and how each sex reacts to what is published, men are twice as likely to write 500+ page books and have, on average, 50% of their readership in women. On the other hand, 80% of the audience of female writers are women themselves. This, in turn, impacts how well specific quotes are received.

It just seems that the subject matter selected by male writers is more encompassing, or that they appeal to a wider population. This might lead to them ranking higher than women in metrics such as confirmed reads and even liked quotes.

And yet on average Goodreads state that women rate books by other women higher than they do books written by men (4.0 compared to 3.8 on a 1 to 5 scale); and the same goes for men: they rate books written by women with a 3.9 average, compared to 3.8 for male authors.

Increased activity doesn’t mean that most of the likes quotes receive are necessarily from women — but the conjecture is that more women like more quotes. At any rate, liking something is a big part of the way we all experience social media. On average, 26% of us are prone to frequently liking things — and women tend to do this more often than men.

It’s very difficult to figure out a clear pattern of this behavior on Goodreads, but I’m guessing the gist of it is this: while women write better-received (albeit shorter) books, it’s the classics which have spread out to international audiences that appeal to both men and women, and as such rank highest with respect to the popularity of their quotes.

The act of liking something isn’t only for showing off what our inner preferences are, it’s just as much a show of support for the ideas behind what we’re liking — and as such a declaration of holding a certain opinion. What’s more, liking something nowadays often shows up in other people’s social media feeds — and as such, it’s probable that liking has become something akin to sharing content.

So when somebody likes a quote, they’re saying multiple things at the same time: I like this because I resonate with what it says, and I also want my friends to see what my tastes are.

Takeaways

While I don’t have a really good picture of what the overall user base of Goodreads is like and how it reacts to quotes, I can say for sure that the majority of them are English speaking, at least with respect to their behavior on the internet. And this might impose an unwanted limit on my analysis.

Why do I say this? Because there’s a hefty chance that some really good quotes don’t portray their full emotional range when translated into the English language, and as such, they won’t become very popular. While it might be tempting to say that, overall, English speaking writers are the best quote authors (think Oscar Wilde, J.K. Rowling, and Dr. Seuss) it’s very likely that the results are skewed by the fact that Goodreads is, ultimately, an English-speaker oriented website.

And while this might not be a bad thing (as I’ve previously stated here), it’s more than likely that the picture we see on Goodreads doesn’t include the full amplitude of what people really like to see quoted.

Another gap (but one I can’t really get around) is that this is all positively biased — I can see what people like (and even quantify such data), but I can’t see what people hate. Like those brain-melting quotes, we sometimes get to see on Twitter or Facebook.

Anyway, I’ll leave you with this, as food for thought:

“[A] quotation is a handy thing to have about, saving one the trouble of thinking for oneself, always a laborious business.” ― A.A. Milne; tags: independent-thought, quotations, quotes; 704 likes (as of early October 2016)

Thanks for reading.

NB special thanks go to Evelina for making the sentiment analysis possible.

--

--