Building Smarter Search
Using crowdsourcing and data science to improve discovery
by Airong Cai, Yan Huang
… and check out the TechCrunch article about this topic! …
Here at Coursera, we care deeply about connecting learners with the right educational content to reach their goals. The good news is, with over 1,800 courses on our platform in just about every subject and at a range of levels, we almost certainly have the right content. However, finding the perfect course in a catalog of 1,800 is often no easy task.
We are hard at work continuously innovating and iterating on our data-driven discovery processes. These include search ranking, personalized course recommendations, featured lists, browse paths, related courses, and more. Some of these innovations are highly salient to the user — it’s hard to miss the new goal-oriented onboarding flow, for example. But many of the innovations are a bit more “under the hood”, especially to start. One of these is skills-based discovery.
The development of skills-based search has been a months-long exercise in crowdsourcing, data science, and machine learning. Here’s an example of the result, showing how skills mapping improves search results for one skill — “P-value”, an important concept used in statistical hypothesis testing:
Before Skills-Based Search
After Skills-Based Search
Pretty cool, right? How did we do it? We leveraged the incredible scale of our learner community — now over 24 million strong — to build a system powered by insights and recommendations from our learners.
First, our learners provided inspiration. Most learners who search the Coursera catalog search for skills — from broad skills like “Marketing” or “Programming” down to more granular skills like “SEO” or “NumPy”. Searching by skill makes sense, given that many learners come to Coursera to build the specific skills they need to start a new career or advance in their current career. But because our catalog isn’t structured around skills, we also noticed that learners who searched for skills weren’t always finding the courses most relevant to their needs.
To improve the results from skills searches, we needed to answer one primary question: For any given skill, what are the courses in which the learner is most likely to learn that skill?
We started by constructing an initial list of about two thousand relevant and popular business, tech, and data science skills. Then came the hard part — identifying the skills that are taught in each course on our platform.
For that second step, we turned to crowdsourcing, leveraging the potential of our extensive global community. At first, we asked Coursera instructors and Community Mentors to tag skills for their courses. However, we quickly realized that we needed more data, faster — and we also noticed that instructors tended to overlook certain key skills, especially the concrete tools like programming languages and software packages that so many learners search for.
We then started asking learners to provide the data we needed by answering a short “skills you learned” questionnaire at the end of every course you complete. And wow, are we glad we asked! Our learners have been doing a phenomenal job, giving us a steady stream of rich, continuously-evolving information that reflects the skills they most care about learning, which are often exactly the terms they’d use to search our catalog. They’ve also been adding to our skills list by suggesting skills that we might not have thought to tag — such as “Tableau” (the interactive data visualization software) and “Cocoa” (Apple’s native object-oriented application programming interface).
By combining all of this great crowdsourced data with data mined directly from course content, we’ve built a skills-tagging model that returns, for each skill, an ordered list of courses in which the learner is most likely to learn that skill. And these predictions are always improving as new inputs from learners around the world are automatically integrated.
To enable skills-based search, the skills tagged to each course are loaded into Solr, our search platform, as a list of keywords for indexing. We use Solr’s Query Fields (qf) to dynamically configure whether the skill keywords are searchable or not for A/B testing purpose. When users search for a skill, all the courses tagged by the skill are returned by Solr and then re-ordered using the ranking scores from skill-tagging model.
Skills-based search, which is currently in testing and will be available to all learners on Coursera soon, is just one of many possible applications of the skills-tagging model. We’ll be exploring more applications throughout 2017, all aimed at making the learner’s experience on Coursera more personalized, responsive, and impactful. Stay tuned for more exciting updates to come!
Airong Cai is a Data Scientist working on Discovery Data Products and Yan Huang is a Software Engineer working on the Growth Platform team.
Originally published at building.coursera.org on January 31, 2017.