Sitemap
Coursera Engineering

We're changing the way the world learns! Posts from Coursera engineers and data scientists.

Follow publication

Order from Chaos: Understanding Search Queries through Vectors

6 min readSep 13, 2019

--

How Query2Vec Works

The Query2Vec algorithm pipeline. In this case, the features are a dataset of the skills that learners can gain from our courses.

Preprocessing

A few of the queries related to “deep learning.”
Queries after preprocessing.

Vectorizing

T-distributed Stochastic Neighbor Embedding (t-SNE) of some sample query vectors. t-SNE is a popular method for visualizing high-dimensional vectors; it is also used to categorize our learning content. We can see clusters indicating relationships among words: “free” and “introduction”; all the terms “with r”; “arts” and “music”; “politics” and “governance.”

Mapping to Features (Similarity Search)

Sample results. The “query” column contains raw search queries, the “top features” are the most relevant skills, and the “distances” are a measure of similarity that ranges from 0 to 1, where 0 means that they are identical.
Sample results with a higher distance threshold, allowing less similar features to be tagged.

Analysis Possibilities

t-SNE on queries that have been mapped to skills in our database. They form four distinct clusters, representing the skills “digital media,” “agile software development,” “mechanical engineering,” and “creative writing.”
Enrollment rates for popular skills (popularity determined by Query2Vec).

Acknowledgments

About the Author

--

--

Coursera Engineering
Coursera Engineering

Published in Coursera Engineering

We're changing the way the world learns! Posts from Coursera engineers and data scientists.

Responses (1)