The Evolution of Coursera Search: Enabling Product Innovation Through Technical Innovation

Chris Liu
Chris Liu
Sep 26, 2018 · 4 min read

At Coursera, millions of learners use search to discover courses. For learners with a specific intent, we need relevant results. For learners with less concrete goals, we need to give a feeling of serendipity by injecting novelty and diversity into the results. In this blog post, we detail how our new search system, powered by Algolia, allows us to iterate toward this future.

Previous Search System

Search at Coursera has undergone two major revamps. Our initial approach was to return all the course data and search on the frontend. This approach became untenable as our catalog grew to hundreds of courses. We then revamped and constructed a search system powered by Solr. The architecture is as follows:

Our Solr based search system architecture diagram.

We indexed data from our online systems. We extracted associated metadata such as the instructors’ names. We supported features such as spell checking, stemming, stop word filtering, and word canonicalization. As seen in the above figure, there is complexity around data retrieval and processing by relying upon online systems.

For relevance tuning, our Solr schema contained fields with hand-tuned weights. For instance, the title of a course should have more influence on the score than the description. We also had a dynamic boosting system that allowed for behaviors like boosting the scores of documents in the learner’s native language. Lastly, a reranking module allowed for skills-based search by taking the Solr-scored entities and applying custom reorderings for specific types of queries.

This system has powered our search for the last four years, but we faced some challenges:

Design Requirements

The main requirements we identified as we looked at iterating on search are the following:

Development productivity

Performance

In our current system, the median response time is several hundred milliseconds, which blocks experiences like search-as-you-type. Our new search system should return all the data necessary to power the search experience, while trimming the median response time to less than 10ms.

Where we are today with Algolia

Our current architecture diagram, while we experiment with a new search experience.

Today, we’ve simplified the search system by:

  1. Processing is consolidated within our Enterprise Data Warehouse [EDW]. Data scientists and engineers can use familiar tools such as notebooks to process the data to be indexed for search. To begin, we’ve ported the processing logic in our Solr search system, including skills-based search. A standard workflow for exporting the data to Algolia is then used to take the processed data and populate them as Algolia indices.
  2. Tuning is done through the Algolia UI. Algorithmic iterations happen here. For instance, skills-based search is implemented by adding a custom ranking criteria.
  3. Displaying is utilizing the react-instantsearch library. This productive UI widget library allows us to not worry about low-level concerns like maintaining search state or APIs.
Our modern search has a search-as-you-type experience as it meets our performance goals

In the future

In the future, search makes use of behavioral and custom data, while powering other types of discovery experiences.

In the future, we envision a search system that is flexible in incorporating data, allows for algorithmic innovation, and powers all of our content discovery. We’re not there yet, but by factoring the search system into processing, tuning, and display, we’re one step closer.

Coursera Engineering

We're changing the way the world learns!