A top-down view of search

Twinkl Data Team
Twinkl Educational Publishers
4 min readJun 16, 2022

How we think about search on the Twinkl data team.

Originally published by Tom Gibbs, Lead Data Scientist at Twinkl, at https://www.twinkl.co.uk/blog/a-top-down-view-of-search

The problem of creating a search engine gets more and more complicated the longer you think about it.

On the face of it, search seems simple: Look for all the resources that feature the words in the search. That’s it. You search for the words that appear in the title of a resource, and it shows up.

But that’s not usually how people use search, often it’s a single word, or several words but in a different order than we might expect. Or words that mean different things based on their context & order.

Take “Year 1 phonics phase two week 3” — we’ve got words and numbers used so we have to try and deal with that, we’ve also got 3 different numbers which pair up to the word before them, we’ve also got loads of potential configurations for the words (“phonics year 1 phase two…”, “year 1 week 3 phonics …”, etc. etc.).

All of a sudden search feels a lot more complicated — how to deal with every permutation?

We also have a different problem — in the example above, there’s a lot of information.

7 words!

We can probably have a decent go at what the user is looking for when they search with 7 words. But a lot of the time, we don’t have that much information. Users won’t often search using 7 words, they might only use 1 or 2.

For instance, the word “phonics” — with no other context — has been searched nearly 10000 times on our site in just the last 2 weeks. Just looking at resources with the word “phonics” in the title (let alone in the keywords or content) there are over 4000 resources which match that search.

The brilliance of Twinkl is having a resource for every occasion, but it also makes search a much harder problem to solve.

My goal as Key Algorithms Lead is to take these difficult problems and break them down into things we can work on as a team. In the case of search I looked at the 3 main pillars of information that we have available to us:

What someone has typed (their “search term”)

Metadata information about each resource (the resource titles, the keywords, etc)

Information about who the user is (the country they’re in, the domain they’re on, what language setting they have, their behaviour on site)

Every idea we have to improve search should improve one of these 3 things or — even better — help us to strengthen the relationships between them.

Data visualisation is my passion :)

Evaluating how well our search matches the resource metadata (what most people think of when they discuss search) only addresses link A.

If we want to build a really effective search we also need to evaluate what a specific user means when they search for “phonics”. We need to build link B.

We also need to know how well each resource fits what users are looking for right now. Christmas Phonics resources probably shouldn’t come top of our list in July. We need to build link C.

This has been a really useful framework for idea generation and also idea evaluation & prioritisation. We don’t have infinite time and resources, so how do we decide which ideas to invest our time into?

We can begin to do so without writing a single piece of code by keeping this overview of the problem in mind, helping us to be more effective, efficient and concise in our work.

If you like the sound of what we do here, then you’ll be happy to know that our Data Scientist team is currently hiring.

Check out some of the other posts from members of the data team.

And why not try out these changes for yourself? Below, you’ll find a handy little video that will help you navigate the Twinkl website, allowing you to find what you need with just a few clicks. And make sure to browse for anything specific with the handy search bar!

https://youtu.be/fQICaiLQEkw

--

--