Anticipating What Teachers Need Using Data Science

Published in

Twinkl Educational Publishers

3 min readMar 2, 2023

Learn how we develop data processes to anticipate and provide what our customers need.

This article was published by Sarah Hinsley, Data Scientist at Twinkl.

I’ve been working at Twinkl as a data scientist for about 6 months. Prior to this, I was a chemistry teacher for too many years, and before that I was a computational chemist for AstraZeneca, after completing a Ph.D. in the same field.

The main thrust of my work at Twinkl has been working on selecting the resources that we recommend down the side of the webpage. See the picture below:

These are called ’recommender systems’ and you may have seen them on other websites. Most retail, video and music websites have them, but they may have a different name.

My day-to-day work involves writing Python code to analyse data from users. The data might be resources that they download, or resources they click on. I then use the results of the analysis to plan revisions of the backend code, producing improved lists of recommendations on the website. For example, earlier tests on site suggested that users will click on the recommended resources more if the resources shown are similar but not too similar to the main resource being shown on the page. Maybe the teacher vaguely knows they need some word cards about solids, liquids and gases, but they would also be interested in any other new or different resource that can aid learning of this topic. Perhaps, as well as word cards, they know they’ll probably need a powerpoint or video, or maybe a fun quiz game to assess learning, or a display for a wall on this topic.

The first thing I needed to do was group the resources. The need to group data is a common problem in data science, and experienced data scientists will often use clustering algorithms to split data into groups. However, I was not, at this point, an experienced data scientist, but I did know from my years of teaching that resources mainly fall into 6 categories. The categories I outlined included:

Items that go on the wall (displays), work that the children do (worksheets).
Resources to support teacher exposition (powerpoints, videos)
Activities that encourage children out of their seats (science investigations, games, challenges).
Assessment materials (tests, knowledge organisers)
Lesson planning (schemes of work, lesson plans, resource bundles)

I was able to assign groups to most resources by writing some SQL code that looked for certain words in the title; for example if a resource contained the word ‘display’ in the title, it would go into group 1.

Once the resources were grouped correctly, my team leader and I rewrote the algorithm for the item recommendations. The process was able to be implemented on the database level — the SQL code first looked for resources where the top 8 item recommendations were all in only one or two groups. We then boosted resource recommendations from further down, selected from a different group.

This approach probably worked because the final result was a bigger variety of types of resources in the top few recommendations. For example, instead of only word cards being shown, the new algorithm would show knowledge organisers, powerpoints, display banners and worksheets. Previously, the algorithm was almost too good at finding resources that were very similar to each other, which isn’t always what teachers want.

If you like the sound of what we do here, then you’ll be happy to know that our Data Scientist team is currently hiring.

Check out some more articles from other members of the data team.

Anticipating What Teachers Need Using Data Science

Written by Twinkl Data Team