Vendor’s Donation Impacts in Education
To recap, our final project was working with Brainpop, a non-profit organization focusing on education. The representative of Brainpop, Kevin, provided us two open datasets that are from Donor’s Choose website that he is interested to see the project success and the project descriptions. The two general questions he had - first, does this data show any pattern in what kind of projects are most successful over time, and any variation in those patterns over time? (which edtech projects from which companies, digital vs. physical, which price points, which kind of descriptions?) Second, what about the freeform text description has made certain projects more successful than others? (certain keywords, description length, time of year, etc.)
Me and my teammate, Yan, began to download and merged two datasets together by using Python. Then, we tried out our first iterations.
After the first iteration, we found out some problems with the dataset. For example, the vendor’s name exists under the item’s name. When we search for a specific vendor’s name under the vendor’s category, there’s no match result. This problem raises the difficulties of visualization a valid visualization but we still tried very hard to succeed.
We asked for help from the NYU Data Service and our consultant, Amy, helped us by fixing several problems we had. First, she helped us combined the datasets in Tableau. Then, she created variable sets to find the correct data that lied under the wrong categories. Next, she successfully made all the visualizations correspond to each other. Without her, we wouldn’t be able to have the result now.
Please see the full data visualization in below. Click here to interact with the data.
In order to come up with the visualization we had now, there were so many difficulties that we had encountered. Other than the misplacement of the data, the large dataset itself is already a pain. By using Python and Pandas, it smoothed out the cleaning process a lot easier and quicker.
Overall, I learned that a good visualization requires a huge amount of time to understand the data and clean the data. Also, it will be really helpful to have a clear goal of what to visualize. In our case, we figured out the approaches after several iterations of visualization the data. I don’t know if this is the correct process. From my point of view, a nice organized dataset is always half completion of the final data visualization, which is unlikely to happen in reality.
Credits to:
Arlene Ducao & Amy Lei