When a text (like a birthday card) needs to connect with a large audience, why is it written by a small number of people? Wouldn’t it be more effective if lots of people had the chance to weigh in?
I want to write a great birthday card by harnessing the power of a large number of writers. To do it, I used Mechanical Turk, a popular crowdsourcing marketplace, and an experimental technique which I’ve outlined below.
While validating my results, I found that people tend to prefer cards written with this technique when choosing cards for their friends and family.
What I Did
This project can be broken into three distinct parts. The first part involved generating a large (n=109) set of crowdsourced birthday cards. The second involved filtering down of the set of crowd cards to a small set (n=5, 4.5%) of the best card options. Finally, in the third part, the best crowd cards were compared against a sample (n=5) of storebought birthday cards.
Part 1: Generating the Card Set
Card designs were solicited via Mechanical Turk, where card writers were paid $0.50 to respond to a prompt.
First, the instructions encouraged the crowd writer to think back to the last time they had given a birthday card and write a card to the person they had intended to give it to. Second, four prompts were listed, based off of common greeting card themes such as sincerity and humor.
I based these instructions on my own experience with creative writing classes, where writing prompts are often used to help students think creatively within a short period of time. Most importantly, the prompts was designed to solicit sincere, heartfelt cards based off of personal experience, rather than generic cards based off of their idea of what a birthday card should be.
In total, 109 cards were gathered. Of these, cards that did not meet the guidelines were removed (namely, cards that were written to one specific person, and did not apply generically, cards that were profane or violent, and cards that were plagiarized). Newlines and special characters were removed, but spelling and grammatical errors were left un-fixed to prevent any misunderstanding of the crowd writer’s original intent. After the cleaning process, 72 cards remained.
I’ve randomly picked three example crowd cards to give an idea of what kind of cards the crowd submitted.
Card 1 (Removed for being too specific, because the crowd writer references their nephew)
- Front: You are Special to God
- Inside: He was there on the day you were born and His hand is evident in the person you have become! Have a blessed birthday Nephew!
- Front: Happy Bday! It’s been 10 years that we have known each other and you have been the best friend that anyone could ever have!
- Inside: You are the most fabulous, special, kind, caring and most talented person ever!!
- Front: I love birthdays but more importantly i love you
- Inside: I wish you a happy birthday, i wish we could spend rest of our lives together!
Part 2: Filtering the Crowd Set
To filter down the set of crowd written cards, Mechanical Turk Workers were paid ($0.05) to offer their opinion on the cards. Each card was compared twice against a panel of five storebought cards, totaling ten comparisons. In one comparison, the crowd card was presented first, while in the second comparison the order was flipped to prevent ordering bias. The ordering of the set of all comparisons was also randomized.
The cards were rated on a five point Likert scale, and earned two points for being strongly preferred against the store-bought card, one point for being slightly preferred, zero points for no preference, minus one point for a slight preference for the storebought card, and minus two points for a strong preference for the storebought card. The maximum possible score was a ten, and the minimum possible score was negative ten. The cards were then ordered from lowest score to highest score.
This method of scoring the cards represents a improvement over traditional methods for getting the crowd’s opinion in two ways.
First, rather than asking participants to imagine a hypothetical scenario where they might be buying a birthday card in the future, they were asked to think back to a concrete situation they had encountered in the past.
Second, rather than asking the participants to compare a single birthday card against an abstract quality like “funniness,” two card designs were compared directly against each other, presenting a choice that was more similar to the actual choice a buyer would make in a store.
There was a wide distribution of card scores, ranging from negative 9 to positive 10. Cards were distributed in a rough normal curve, with a roughly equal number of cards receiving a positive (35) or negative (31) score. 12 cards received a score of five or more, suggesting that they were consistently rated better than the panel of storebought cards.
Part 3: Comparison against storebought cards
The top five cards from Part 2 were compared against a randomly chosen sample of five storebought cards. Each card was compared against a fresh sample of five storebought cards, and the number of comparisons was increased from two to five, meaning that each card in the sample was involved in a total of fifty comparisons. The maximum possible score for a card was one hundred, while the minimum possible score was negative one hundred.
A fresh panel of cards was used to avoid overfitting the crowd cards against the same panel (i.e., without using a new panel of cards, it’s possible that the crowd cards performed especially well against the cards in the panel, but were not good cards in general). Similarly, a sample of storebought cards were compared against other storebought cards to prevent any kind of methodological bias that occur, such as Mechanical Turk workers who rated multiple cards getting tired of the cards in the testing panel.
The scores in the third part were much closer than in the second part, suggesting that the cards in the set were more evenly matched. Likely, the panel of cards was also stronger, as the median score was slightly negative, rather than slightly positive as in the first set. Three crowd cards received a positive score, suggesting that they were preferred as compared to the cards on the panel, while only one storebought card received a positive score. The top three cards were also all written by Mechanical Turk workers.
This is the final ranking of the top five crowd cards and five randomly selected storebought cards. The number in parenthesis is the card’s score, with higher numbers indicating that the card was more favored as compared to the panel of storebought testing cards. The card’s type (storebought or crowd) is in brackets.
#10 Card (-35) [Storebought]
- Front: Birthday Todo List: 1. Sit Back. 2. Relax. 3. Take it easy
- Inside: 4. Have a great birthday
#9 Card (-31) [Crowd]
- Front: Your name was not in the obituaries today.
- Inside: So I guess you made it! Happy Birthday
#8 Card (-15) [Storebought]
- Front: Oops, this isn’t my house
- Inside: One good thing about getting older is getting to visit new places. Happy birthday
#7 Card (-9) [Storebought]
- Front: Feeling celebrated and loved yet?
- Inside: Good! Because you are. Happy birthday
#6 Card (-5) [Crowd]
- Front: Happy Birthday!
- Inside: Thanks a lot for all the time spent together. I’m glad to have you in my life.
#5 Card (-1) [Storebought]
- Front: It’s your birthday
- Inside: Hope the whole world smiles at you today!
#4 Card (5) [Storebought]
- Front: You’re the absolute best and I’m not even kidding
- Inside: Thank you so much!
#3 Card (7) [Crowd]
- Front: It’s the best time of year when you’re birthday comes. Now, go out there and have some fun. It’s your day, you can do what you want. Just remember, in 24 hours, its gone. So, go out, have a good time with friends, and don’t stop dancing ‘till the party ends.
- Inside: I want to see you smile from ear to ear. And I’ll see you back here, same time next year.
#2 Card (12) [Crowd]
- Front: Happy Birthday!
- Inside: Each day I’m astounded at the person you have become. I can’t wait to see what life has in store for you!
#1 Card (13) [Crowd]
- Front: With another year complete, we’re hoping you accomplished everything you wanted to.
- Inside: And if you didn’t, my present to you will be to hire this retired drill sergeant to keep you motivated. “get back to work, don’t eat that donut, exercise more!”
Wait, what just happened?
On the face of it, this result seems surprising. Mechanical Turk workers with no formal training or card writing experience were able to create cards that people liked better than cards written by professionals. Here’s my hypothesis about why this works:
People buy cards to emotionally connect with their friends and relatives, and for a birthday card to be popular it needs to express a sentiment that feels heartfelt and intimate. But, for a birthday card to sell well, it needs to express a sentiment that appeals to a large audience. Therefore, birthday card designers need to navigate a tough balance: short but not trite, common but not cliché, fresh but familiar.
Expert writers would appear to have the advantage in this. After all, they have years of training and experience, and probably spend more time crafting each card. But, they are subject to expert biases, such as groupthink and hearding. More fundamentally, expert writers are far removed from the people who are reading them. In an area where fresh thinking and an emotional connection is important (like writing a birthday card) it seems plausible that the crowd would have an advantage.
Our cards were designed in low-fidelity (without artwork). Store-bought cards were likely designed to work with artwork, and our process might be biased to card text that works exclusively as text, and not when paired with artwork.
Mechanical Turk workers aren’t representative of actual consumers. Our cards might perform well with Mechanical Turk workers, and not perform well with real customers in a real store.
In this project, nothing was done to make sure that actual humans were doing the rating, or that they were paying attention.
Although the results of this project are promising, further research with larger sample sizes is necessary. However, this study suggests that — when the work is structured in the correct way — crowds can effectively write a sentimental artifact such as a birthday card in a way that is comparable to or better than the work of experts.
Ask me about:
- Validating that the card scoring method actually works
- When it would make more sense to hire an expert or to write cards with an algorithm
- How, rather than buying a sample of cards to compare against, crowdsouced cards could be ranked relative to each other