PBS Kids Measure Up — Learning Analytics

9 min readFeb 11, 2020

Using analytics to understand and optimize learning.

Recently, PBS Kids in partnership with Booz Allen Hamilton had a data challenge on Kaggle, the 2019 Data Science Bowl, where they released interaction data collected with their app “PBS KIDS Measure Up! app”, a game based learning tool. The game aims to teach children measurement-related skills around the concepts of length, height, weight and volume.

The data is anonymous and shows the interaction path through the game. The objective of the challenge is to predict an outcome, the outcome being an indication of the attempts taken to solve an assessment. The outcomes in this competition are grouped into 4 groups (labeled accuracy_group in the data):

3: the assessment was solved on the first attempt
2: the assessment was solved on the second attempt
1: the assessment was solved after 3 or more attempts
0: the assessment was never solved

The Kaggle 2019 Data Science Bowl challenge has a detailed description of the challenge and of course you can access the data there too.

I stumbled into the challenge somewhat late into the competition timeline, but I found the interaction data provided by PBS to be fantastic (kudos to PBS and all involved parties at the Data Science Bowl for this effort!). For a data scientist interested in learning analytics, I’m excited with this data and I’m hoping it will give us insight into the learnings of a child. In subsequent articles, we look at the challenge itself, but in this article, we will analyze the data from the perspective of an educator or parent of the child playing the game. First, let’s look at a bird-eye view of the data.

Training data: There are 17000 installation IDs of which 3614 installation IDs have at least one assessment event. We end up with 17690 assessments across all installation IDs. df_train_labels has the ground truth data for the accuracy group of these assessments.
Testing data: There are 1000 installation IDs of which all 1000 installation IDs have at least one assessment event and at least one unfinished assessment to be predicted. So, 1000 predictions to be done.

Each application install is represented by an installation_id, which is a randomly generated unique identifier grouping game sessions within a single installed application instance. The competition information states that each installation_id should correspond to a single child but we should expect noise from issues such as shared devices. For this article, we are doing analysis with this above assumption. The app itself though definitely accommodates multiple users and in the scope of this article, we are not accounting for multiple users within an installation_id.

Next, let’s visualize the interaction path of the user through the app. A kaggle notebook to run the code for this whole article is here.

In the code, each installation_id is represented as an Installation object which holds a list of EventGraph_Node objects as aneventgraph,to represent the interaction path.

event_graph = draw_event_graph(install_obj)

In the notebook, the above code will draw an event graph for a given installation object. Now, let’s explore the interaction event path for a randominstallation_id in detail.

Interaction Event Path

The game is heavily instrumented which is fantastic for data analysis. One of the cool things in the data is that you can see the event path of the user as they interact with the data. In an event path shown on left, we see the first node as “Welcome to the Lost Lagoon”. This is the first screen on entry where a child can choose a world to enter. This opening screen has no real world and is marked ‘NONE’ in the data. But, there are actually 3 game worlds: TREETOPCITY, MAGMAPEAK, CRYSTALCAVES. Each world is associated with certain educational goals. For example, the game elements in TREETOPCITY are associated with the topics of length or height. The MAGMAPEAK world teaches topics in capacity and CRYSTALCAVES is the world where the child learns the concept of weight.

There are 4 types of game or media elements in each world: Clip, Activity, Game and Assessment. In the event graph on left, the yellow nodes are the Activity elements such as the Sandcastle Builder activity in node (2). Clips are marked as green nodes like the Slop Problem clip in node (3). The orange nodes are the games such as the Scrub-A-Dub game in node(4). Finally, assessments are marked as red nodes such as the Mushroom Sorter in node (15). The game has five assessments: Bird Measurer, Cart Balancer, Cauldron Filler, Chest Sorter, and Mushroom Sorter. Each assessment is designed to test a child’s comprehension of a certain set of measurement-related skills.

If you would like to look at the event data for other users/installation IDs, take a look at the code in the notebook here.

Start with the user in mind

After reading the competition rules and the challenge description, I still wasn’t sure what many of the terms actually meant with respect to the game itself. What was an assessment? Did it mean like a multiple-choice test or a ball to throw? What was a “clip”? I know what a clip is, but was it a YouTube video of someone explaining, a cartoon? So, I searched the app on Google Play store and faced a hurdle: there was no version to load on my Pixel 3 phone. There is a web version that was shared on the discussion board here.

But, I wanted the version on a handheld device, which I felt is where most kids would consume this content. Note that this was just a gut feel, not really substantiated with data and not relevant to the actual challenge. I did have an old tablet that supported the game and I was finally able to load it and play with it, with some noticeable measure of lag. But, it was good enough for me to play the game and really understand what an assessment or an activity was.

Anecdotal Observations :

The game itself is geared towards preschool or kindergartners and reinforces measurement concepts of length, height and volume. (My 2nd grader showed some interest and then ran to his sister saying “Mom’s gone crazy. She’s playing baby games and learning to count!!”. Lost some mommy respect in my house that day..:))

All-start sorting game teaches the concept of sorting by height.

Games are reinforcements of the concepts. These are somewhat long and to an adult tedious. But I can see where the activities would help in adding practice. For example, the all star sorting game has the child match the dinosaur based on their height to corresponding homes of varying heights. There are several rounds of this game with increasing number of dinosaurs adding to repetitive practice and mastery of concepts.

A video clip explaining the concept of weight

Clips are mostly short videos explaining a concept in a kid-friendly form. If you’ve seen any PBS-style cartoons, these are in the same style, where a character has a conversation with his friend/stuffed animal or talks to the screen and explains a concept. Some characters were vaguely familiar to me as my own kids have grown up with PBS. Nothing really special to note here except that this sort of content is popular with preschool kids and I can understand how they would watch this with attention.

Activities are long events where there is a certain degree of freedom for the child to explore as compared to a game. It is not goal based. In a game, there is a goal to meet, for example, a number of animals have to be matched to their tub whereas in an activity, there is no such obvious goals to finish. For example, in the Bug Measurer game, you could drag the bugs down to measure, but the choice of which bugs to measure or their order was left to the user. In the Sandcastle Builder activity (above left figure), the child fills the molds with sand and inverts it to build the sand shape. This activity finishes when the child is ready to move on. So, the event could complete with the child doing the measurement as many times as they wanted or none. This was interesting since it’s more a self-driven activity.

Assessments are the last type of events and they are a culmination of everything the child would’ve learned through prior activities. For example, in my usage activity, the bird measurement assessment happened after the Crystal Rule game and after the Bug Measurer game. I could see the correlation of the games and activities that led to the assessment. The game was cleverly crafted for teaching a child.

add a caption

Above is a video of the bird measure assessment. You can see how my mom fingers struggle to complete the assessment (and dare I say mom-brain because I can’t drag the hat to the head apparently!). You can hear my second-grader whisper his guidance.

In the next part, we’ll explore the interaction data and look at features that would be interesting for a parent/guardian of the child. As a parent, I would be interested in how often my is child logging into the app, how long does he/she plays for any session and importantly, is he/she learning from the concepts. I would like to see a report of progress from across the sessions similar to how Khan Academy or Prodigy sends out periodically to a parent on how the child is progressing with the service. Interestingly, PBS Kids MeasureUp has a sister app which can be paired with the MeasureUp app to get reporting insights for the parent. We will perhaps explore this app in future.

Interaction insights

Now that we’ve seen the app and understood the user engagement with the app, let’s look at some interaction insights. We consider the random installation_id 3102. This installation_id has 7 assessments in total.

Event count in each world per game element

This child has events in each of the three worlds and pretty much for all the game elements, although for some things, there is just one event which could imply that the child is just cruising through. Let’s look at the game time spent in each interaction to get a better understanding.

Time spent in each world per type of interaction (in seconds)

With this, we get an idea that the child is watching clips and playing games for slightly longer in TREETOPCITY than other worlds. What is also interesting is that the child plays longer in the MAGMAPEAK activity than other worlds and has taken assessments in all of the worlds.

When we look at games in particular, we can break down certain information on how many rounds were passed and missed based on the event information. Below, we see that the child passes more rounds in Scrub-a-dub game which is themed around the concept of Capacity vs the three games played in CRYSTALCAVES themed around the concept of Weight.

In fact, the child has no passing rounds in TREETOPCITY where games are based on Height and Length even though more time is spent playing clips and games in this world than other worlds.This may be useful information for an educator to add supplementary information around these concepts.

The chart on the left provide additional support that Capacity seems to be a topic where the child progresses much further than other topics at least in the time span of the data provided.

accuracy group of assessments taken by child

When we finally look at the outcome of the assessments in the three worlds (above figure), we see that the child has successfully solved the capacity-based assessments in MAGMAPEAK albeit with multiple attempts adding further evidence that the child understands the concept of capacity. This in itself is useful information for an educator or home schooling parent perhaps to augment or modify the child’s curriculum.

If you enjoyed this article, we would love some claps, to motivate us to write more and for medium to show our writing to more audience. Thanks!

All images, data and source code used for this article is here:

PBS Kids Measure Up — Learning Analytics

Interaction Event Path

Start with the user in mind

Anecdotal Observations :

Interaction insights

Written by boxinthemiddle