Creating Teaching Resources With Data Science

Twinkl Data Team
Twinkl Educational Publishers
3 min readJun 16, 2022

Find out how we’ve used internal data, external data and machine learning to respond to customers in real-time.

This article was originally published by Bradley Pearson, Data Scientist at Twinkl, at https://www.twinkl.co.uk/blog/creating-resources-with-data-science

Leading the Way

Here at Twinkl we have 4 core values, one of these being “Lead the Way”. This is what we are accomplishing with this new system that has been created by the data team through close collaboration with the whole business. In an educational context, this system is truly a first. We utilise both internal and external data to serve our customers in the fastest and most effective way possible, all from one simple start point… searches. This is not something someone can manually look through given we get upwards of 100,000 searches a day! The system’s aim is to transform these terms into real insights about our customers’ needs, allowing us to provide them with immeasurable value!

What does the system do?

The system looks through the searches on our website in real time and determines whether they are of interest, but how do we determine what is interesting? Well, this is done through the aggregation of data from across our systems to create variables, and giving each of these weights. This is determined by collaborating with people from across the business to rank their importance. It is then possible to understand why certain searches are being flagged whilst others are not.

An example of a factor that is used in the system is search success. In essence, did a search lead to a download? If not, then there may be an opportunity here. There could be a multitude of reasons as to why a user has not made a download after making a search which is why further factors are integrated into the system. At the time of writing there are over 30 different variables included in the system, all weighted differently to try and ensure the top results are those which are most likely to help us help those who teach.

Overcoming Issues

There were many issues that needed to be overcome whilst developing this system from a data science perspective. One of these was data cleaning. This was an extensive process that involved steps such as removing staff members from appearing in searches (skewing the searches after meetings) and removing offensive language in searches. By doing this, the final results have been significantly cleaner and more actionable.

Another issue that needed to be addressed within the system was searches in non-English languages. How do you deal with this without getting a native speaker to manually translate each search term? Well, that’s where a smart data science solution was required. A simple solution could be to translate every search that comes through, however that solution is not scalable (and expensive!) given the volume of searches we receive. The innovative solution we arrived at involved identifying the language of these searches by setting up a language detector model. This language detector model was trained on internal data to ensure that Twinkl-specific words were not flagged as a foreign language. Once this model had been created the non-English terms could be identified and subsequently translated. This smaller subset is a much cheaper and more efficient solution!

The Final Output

All of this is great, but I’m sure you’re wondering what is a tangible insight that has been outputted directly from the system? Well, we’ve created over 100 resources since the beginning of 2022 and that’s only 10 working days! Examples of the resources that have been created from the system can be found here and here. This system is becoming vital to how Twinkl can dynamically identify and meet the needs of our customers extremely quickly and directly — proving how data science can lead change within an organisation. Data science is not a solo endeavour but rather a synergy across the business to create actionable insights.

If you would like to be the part of projects just like this, remember we’re always looking to hire into our Data Scientist team

Check out some of the other posts from members of the data team.

--

--