Photo credit Gizmodo

How Mechanical Turk Accelerated our Product R&D

Mitchel Seaman
Product Labs

--

Mechanical Turk combines human judgment with processing tasks. This is pretty useful, because it means that with a little effort and about fifty bucks, you can get thousands of humans to answer basic questions for you. My team and I used it to help with a research question for songmonkey.io that interviews and quantitative research couldn’t quite answer.

There are some Mechanical Turk tips at the bottom of this article if you’re looking for more of a how-to. But first, the story!

Background

We wanted to publish videos to teach people how to play popular songs on the piano. Research and interviews told us that a beginner musician’s go-to method for learning to play a song was to head to Google and type something like “how to play All of Me on piano.” Their number-one result? A ton of YouTube videos.

Some of the videos attracted a lot more views and likes than others, so we set out to discover what separated the million-view videos from the 10-view videos.

Setup

First I performed a bunch of YouTube searches for instructional videos, sorted them by popularity and checked out things they had in common.

Synthesia in action

Some included video of an instructor playing piano, some used a tool called Synthesia to create a Guitar Hero effect, some displayed sheet music, some broke the songs up into different sections, and so on. After a few dozen videos, their attributes started to congeal.

Great — but what made a video popular? The manual-research results weren’t overwhelming — we needed a bigger sample of videos to figure out the relationship between a video’s content and its popularity.

With the help of the engineer on our team, we wrote a script to pull the top ~5,000 video results and some stats (date published, total views and likes, URL, etc.) for our search terms from the YouTube Data API . Some interesting patterns emerged; for example, certain naming conventions and video length correlated with video popularity.

The API gave us some useful information, but in order to classify the actual content of the video we needed a human eye.

Enter Mechanical Turk.

MTurk was developed for just this type of situation — getting a lot of people to perform a bunch of small, straightforward, digital tasks. We asked them to categorize our YouTube videos.

I made a Mechanical Turk project that asked workers to categorize each video and identify some attributes:

The workers categorized each video by its purpose, which allowed us to filter out unrelated videos. Workers also identified the elements of the video that we found in our initial qualitative research, so we could see which of those elements correlated most strongly with video popularity.

We were able to filter out a lot of videos that weren’t really about teaching songs on piano, which gave us a cleaner data set than just the YouTube search results on their own.

Results

Synthesia “falling bars” and a clear view of hands on the keyboard were found more often in popular, well-liked videos than were verbal lessons, song segmentation, or notes displayed onscreen. That was useful to know, because it encouraged us to focus on producing a certain type of video.

The team decided to make videos using the Synthesia tool and a teacher’s hands on the keyboard. They were relatively cheap to make and turned out to be really popular in usability tests.

MTurk and some data analysis were a great place to start making decisions on how to move forward. The larger video sample size gave us greater confidence that the trends we noticed were real, not just anecdotal. It’s a solid tool to consider next time you want to add a little en-masse human judgment to your research.

Thanks to Pivotal Labs designers Aaron Lawrence and Becky Gessler, and the team at songmonkey.io. Amy, Tanner and the team have crafted a pretty fantastic piano teaching tool. Grab a keyboard and check it out; you’ll be playing Taylor Swift in 5 minutes.

Mechanical Turk Tips:

  • Make questions as straightforward as possible, breaking them down into multiple easier questions if necessary.
  • In order to verify responses, ask multiple MTurk workers to answer each of your questions. For example, I asked 3 Turks to categorize each of my videos and made sure their responses matched before adding that video to my results.
  • Mechanical Turk has some tools you can use to vet workers — you can create qualifications that include tests to make sure you’re asking the right people your questions, and you can enlist Mechanical Turk Masters who have demonstrated a certain type of proficiency in a given area. Use this if it makes sense. It didn’t for me, because my task was simple and I used redundancy instead (see previous tip).
  • Work in small, increasing-size batches of tasks. Try starting with a one-item batch, then 10, then 100, then 200 and so on. I made a lot of mistakes setting up my tasks and could have saved time and money by being a little more patient and ramping up.
  • Do your truly qualitative research separately in user interviews — MTurk workers make more money by performing tasks quickly, rather than answering questions thoughtfully, so don’t expect essays or advanced insight. That’s not really MTurk’s sweet spot.
  • Like surveys, Mechanical Turk can give you a lot of data that looks authoritative because it’s in a spreadsheet. Be on the lookout for research biases like leading questions, selection bias, and confusion between correlation and causation in your research. MTurk is best used alongside other experiments. My experiment, for example, included some sampling bias because I performed the YouTube searches to feed the video data set, rather than including every YouTube video.
  • Control for external variables — the most obvious external variable that gave us bogus results at first was time — we judged video by their absolute number of views, rather than their rate of views (per month, year, etc.). Older videos have more views than newer videos, even though their quality isn’t necessarily as good.

I’d love to hear tips and questions if you have them. We also kept the YouTube scripts if you’re interested in specifics. Hit me up in the comments or on Twitter: @mitchelseaman.

--

--