Visualizing Activity in a Discussion Forum
As the first warm-up activity for our research activities this summer, we spent some time analyzing the CS 111 Spring 2015 course at Wellesley College. Seventy-two students completed the course, with the assistance of four faculty, two supplemental instructors, and seven student tutors.
Almost every member of the DAV lab took CS 111, therefore, we felt well qualified to perform this task. Most of the analysis was qualitative, but we found some data for a quantitative analysis: emails sent to a communal CS 111 Google Group.
Using the Gmail API
The first step in our process to analyze and visualize emails sent to the CS 111 Google Group was to retrieve the emails. Google does not provide an API to directly access this data, therefore, the only way to retrieve these emails was through a Gmail account containing all emails (Thank you Whitney for never deleting them).
The Gmail API can be initially used to retrieve IDs of emails that contain a certain field. Then, we sent separate requests to obtain the content of every message using its ID . We decoded the messages and organized them in a list of dictionaries to make the emails readable. To specify who sent each email, we created the following labels: faculty, students, administration and student leaders.
Our lab is currently using Bokeh, a Python visualization library. We decided to create a donut chart, a heat map, and a histogram. We created a donut chart to visualize the percentages of emails sent by particular category (e.g. faculty, students, tutors, administrator). Here is a snapshot of our chart:
To visualize email activity, we created a heat map representing the number of emails sent to the CS111 Google group per day. The intensity of the color increases as the number of emails increases.
Hovering over a rectangle will display the date, the number of emails, and the event corresponding to that day (lecture, lab, problem set, etc). We will not only be able to see the changes in number of emails sent over time, but also whether email activity corresponds to certain events.
Finally, we visualized our data using a histogram: