Phase 2: BrainPop

Chian Huang
Data Visulization
Published in
3 min readNov 8, 2018

The community partner that I have decided to work with is BrainPop. The reason that interests me to work on the dataset that BrainPop proposes is the pattern I’m looking forward to seeing regarding what attributes are affecting the relationship between the teachers to donors and the funding for classroom supplies.

After a short interview with our community partner, who is the representative of BrainPop, Kevin, I have better an understanding of the background of the dataset and the process of going through the datasets.

The interview began with a very general question, what in general does BrainPop focus on a project starting from the early stage, i.e. collecting the data to the stage of visualizing the data. According to Kevin, it depends on the project and also depends on the scale his team is going for. 75% of the whole process from start to end is about cleaning the data, understanding the data, and reorganizing the data. “Cleaning the data is the most painful process but you can say it is also the most fun part,” said by Kevin. Patience is required and needs to be experienced and think outside of the box sometimes. For instance, Kevin mentioned about when he began looking into the donor’s datasets and he discovered some of the company’s name that is combined with two words or different alphabet cases will sometimes deteriorate the dataset. The way of solving this problem is to dig out all the possible spelling and unify them.

After discussing the early stage process, the topic turns back to the datasets that BrainPop proposed. Since the datasets are opened to the public which is provided by DonorsChoose.org, how the data are collected remain unknown. Without much controls on the source side, the only suggestions we received from Kevin is to make a good judgment on our own. According to his experience, a messy dataset like this will require a lot of time and effort into it before we study the message.

In terms of the data visualization that Kevin is expecting, he suggested using R to plot the data which is the best language to visualize scientific dataset.

Sketch from Kevin

I proposed other tools and approach and Kevin seems acceptable with it. Although, Kevin reminded us to start visualizing the dataset as soon as possible even though it’s from a very rough sketch. After the first version of the visualization, I will be able to start seeing the patterns or interesting lines in it. He even suggested creating two comparison visualization to be able to compare each other. This is also a good way to find out the patterns.

Next Step:
Thankfully, Kevin helped us to combine two datasets into one and also add the headline ID so we will have an easier start. Me and Yan will follow Kevin’s suggestions and begin to start cleaning/organizing the dataset.

--

--