How our Skills Graph is helping learners find the right content to reach their goals

Emily Glassberg Sands
Coursera Engineering
4 min readJul 10, 2018

At Coursera, we use data to power our product and better serve our learners. One example is our Skills Graph —a series of algorithms connecting learners, content, and careers through a common skills currency. At its essence, the graph maps a robust library of skills to each other, to the content that teaches them, to the careers that require them, and to the learners who have or want them. It’s built on data from across the site and powers a range of applications in content discovery and beyond.

Take as one example the is-taught-by edge between the skill node and the content node. It’s generated by a machine learning model with features that include attributes of the material like the frequency with which it references the skill or concepts related to the skill. One of the most valuable features, though, is what learners on the platform are self-reporting learning as they move through their experience. This edge powers a few data products on the site; one is skills-based search.

Imagine a learner looking to learn a specific tool or technology, maybe because she needs it for a freelancing job, or because she sees it listed on the requisition for a job to which she wants to apply. While tools and technologies are often taught in courses on Coursera, instructors may not mention them in describing the course. An example is NumPy, a package for scientific computing in Python. Searching the catalog for ‘NumPy’ would return null results with just a standard text similarity-based relevancy model — and in fact did until we built this edge and deployed it in search. Now the query returns 21 matches of courses where from the graph we know learners are learning NumPy. This extends across a range of hard skills — from the very broad to the very granular.

Once we built the graph infrastructure and unleashed it to incorporate learner-reported tags, it surpassed what we could have come up with on our own. For example, our initial set of skill tags was exclusively in the business, computer science, and data domains. But today, thanks to the graph, learners can easily find content that’s teaching soft skills, too— even where theyare taught only indirectly. For example, when a learner searches for ‘confidence, she is returned several courses on public speaking, the infamous Learning How to Learn course, and more — all powered by a rich stream of learner-reported data that feeds and updates the graph each day.

While the skill-based search application produced our single biggest algorithmic win yet in search, it assumes the learner knows what she wants to learn. Since many learners are more focused on what outcome they want — for example what job — we extended the graph to include a mapping between careers and the set of skills they require. This is based on the frequency with which skills appear in postings for that job, and based on the skills we observe real learners in those jobs have through their in-course performance. Here’s one application: As the learner is browsing Coursera content she can filter down by career relevancy.

We can do better still by incorporating data on each individual — for example using our platform data to rigorously measure what each learner already knows, and using that to land them in the right level of content. This starts with item-response theory models trained on the hundreds of millions of questions that have been attempted on the platform. In a nutshell, the models output an estimated difficulty for each question. Marrying these estimates with a given learner’s performance on the assessments she’s attempted, we can infer her level in each of a range of skills. Below is sample output for a single learner. She is relatively stronger in Data Management, but weaker in Machine Learning. Knowing this allows us to, among other things, recommend beginner ML content.

Today we’ve shared some examples of how our Skills Graph, built on rich data captured across the platform, allows us to develop a more robust understanding of learners and content and careers and, when fed back into the product, is helping each learner find the right content for them.

In the coming weeks, we’ll share other applications of the graph, including how the graph is unlocking valuable insights for our enterprise customers in an application called Skills Benchmarking.

Interested in applying data science to education? Coursera is hiring!

--

--