RegistrarVis: A Visualization for College Registrars

Published in

VisUMD

7 min readDec 15, 2019

Visualizing college class registration data.

RegistrarVis being used to visualize class registration data.

In modern colleges/universities, the academic expertise of undergraduate students is usually determined by their major. However, the typical concept of a major (i.e., Economics, Chemistry, and Philosophy) is slowly being replaced by more fuzzy definitions. Numerous universities provide a dual-degree education where students can take courses from two different majors and earn degrees from both departments. Also, careers are becoming increasingly interdisciplinary. For example, in the field of human-computer interaction (HCI), researchers and practitioners often need knowledge from multiple domains, e.g., Computer Science, Psychology, and Electrical Engineering.

From a student’s perspective, a degree from a specific major has value in understanding their own academic background. However, it is difficult to discover which part of the major were their strengths, or weaknesses in a granular level. Although they may be a Computer Science major, they could be proficient in data science, but not in low-level subjects, such as computer architecture.

Motivated by these issues, I wanted to develop a visual analytics tool — RegistrarVis — that profiles students’ academic abilities based on the courses each student took so far. By profiling students’ academic abilities based on their courses taken, we expect to demonstrate the strengths and the weaknesses of students’ academic abilities in a granular, sophisticated manner. I hope that the framework of this approach provides a better opportunity for students to reflect onto their academic strengths and weaknesses than just by their degree, which in the end will help them the design of their career paths.

Project Design

I present the project design in three sections: (1) design considerations and (2) the mechanism behind the tool. The system is based on the following steps: (1) preprocessing, (2) building architecture, and (3) using a front-end server and a back-end web architecture.

Dataset

In this project, I utilized two datasets. The first one is student class registration logs for 6 years’ time — from 2013 to 2018 — from Korea University. This dataset is given in Korean and is provided in a comma separated version (.csv) format. The size of this dataset is 271MB.

The second one is a catalog of all courses given in 2018 in Korea University. It contains a short introduction on the summaries of the courses. This dataset is given in English and is provided on a .csv-format. The size of this data is 26MB. The data were provided by Korea University, and I was given the permission to utilize the datasets for research purposes.

Design Considerations

As mentioned in the introduction, the purpose of this tool is to let students discover their aptitudes through the analysis of students’ class registration records. In doing so, we take into account several design considerations in designing our visual analytics:

(1) Providing an overview of the system. According to Shneiderman’s mantra (1996), “overview-first, zoom and filter, and details on-demand,” it is effective to provide an overview of the entire data in discovering them. I provide an overview about a student’s class registration records.

(2) Presenting textual patterns from class registration logs. Whereas the title of each subject exhibits an overall summary of the course, it does not sufficiently show what knowledge the course is trying to convey. To address this issue, I utilize the provided short summaries to analyze and extract the textual patterns.

(3) Enabling comparison with others. In order to understand one’s registration trend in a more objective manner, students should be able to juxtapose their own data with others. Hence, in our work, we endeavor to show comparisons between a student’s data and how others have been doing.

(4) Considering the constraints from the data. In choosing the best visual components, I consider the constraints occurred by the structure of the dataset, such as the lengths of texts, metadata, and so on.

Visualization Design

The top image gives an overview of the RegistrarVis system. It is composed of four parts. In the first part, I show the overview of a student’s class registration records by using a dot matrix plot as a discrete stacked bar chart. Each dot represents a course taken by the student in a particular semester. I divided the types of courses into two, (1) major, and (2) non-major. The courses are distinguished by their course codes. For example, a course in computer science has a code `COSE203.’ The code `COSE’ represents the department of COmputer SciencE, and 203 represents a unique number of the course open in the department. If, in case of computer science, the course has `COSE’ in front, then it is classified as a major course, and if it does not, then it is classified as a non-major course. The two different colors represent the two types of courses.

At first, I started to show the temporal trend of the course, by using a stacked bar chart. However, the stacked bar chart has a disadvantage: it does not intuitively show the number of courses taken per semester, since it is represented in a bar. Although legends are there to help, it is confusing when counting the number of courses in the area far from the legend. Hence, I felt the need to discretize each course to intuitively convey students the number of courses taken, and deployed a dot matrix plot. The blue color represents major courses, and green represents non-major courses. By hovering on the dots, the student can check onto the details of the course by looking at the tooltip that appears on top of the dot.

Under the discrete stacked bar chart, I placed a table that shows the raw contents of the courses. The table has five columns: (1) the course code, (2) the title of the course, (3) the year, (4) the semester of the course taken, and (5) the short summary of the courses. Not all the summaries were provided in English, and those that are not provided we reluctantly left in Korean. By clicking on the titles of each column located at the top of the interface, users can sort by each feature both in ascending and in descending order.

The first part of the second column has three visual components, (1) one comparable bar chart, and (2) two stacked bar charts. The comparable bar chart shows the number of the courses taken by their types of courses compared with the cohorts of the department that entered the same year. In providing comparison, we thought that the most reasonable standard for comparison was the students who entered the university at the same year, on the same department. The above bar chart shows the number of courses taken by the student per each semester, and the one below shows the mean number of courses taken by students of the same entrance, and of the same department. We show the two stacked bar charts to let the student juxtapose his records with how people of similar background have been taking.

The two topic word clouds underneath shows the frequency of keywords that appear the most in the student’s major courses, and in their non-major courses. I originally endeavored to utilize topic models. Topic models are useful in effectively summarizing a large size of documents into small number of co-related topics. When using topic models, each course summary is treated as one document. However, in our case the number of courses taken by students were small for a majority of students (more than 60% of the students took less than 20 courses) and hence thought that using topic models was not appropriate. Hence, after removing stop words (stop words are provided by the NLTK library in Python), I listed the keywords found in the course summaries by their types of courses in a descending order.

System Design

I designed the architecture of the system in a front-end server and back-end web interface. In constructing the front-end server, I utilized Python 3.6 with Flask library. As a web protocol, I utilized Websocket protocol. For the backend web interface, I utilized Javascript, D3-library (v3) with JQuery.

Usage Scenario

A short demonstration of the video is shown in the following link.

The story of the visualization starts by searching for the student number and clicking on the `search’ button. Then, visual interface is automatically updated. The discrete stacked bar chart shows the overview about the courses taken by the student. By hovering on each dot, they can check the title of the course. Afterwards, they looks at the table to view the detailed information about the courses taken. They can check the short introduction about the courses taken. By looking at the comparable bar chart, they can find out how many major courses they have taken by comparing with other students of the same entrance year, and of the same department. With the two stacked bar charts, they compare their class registration trend with others. Then, with the two word clouds, they can find out the kinds of topics, or keywords they focused on when taking courses in colleges. Based on this data, they can decide whether to (1) expand on their interests, or (2) take courses to supplement on their underrepresented areas.

Limitations, Conclusion, and Future Work

In this project, I created a visual analytics tool for understanding the student’s academic interests in colleges using class registration logs. Based on the design considerations, I created an overview of a student’s data, deployed visual interfaces to show the textual, temporal trends, and compare with how other have taken.

One limitation I want to mention is on the evaluation of the work. Because of the limited time provided, I was not able to evaluate the effectiveness of the system using real students. If time is provided to evaluate the work, I would like to evaluate on the following points: (1) efficacy of each interface by comparing the interfaces with originally-planned interfaces, (2) the insights gained by the students from our system, and (3) possible applications and extensions of this interface to help students understand their course registration logs.

Furthermore, as a future work, I plan to create a sophisticated analytics tool that helps students make decisions about class registrations using other datasets such as students’ postgraduate career records and comments about each course.

References

Shneiderman, Ben. (1996) “The eyes have it: A task by data type taxonomy for information visualizations.” In Proc. the IEEE symposium on Visual Languages. 1996

Acknowledgements

I would like to thank Korea University Office of Information Technology and Service for providing the dataset.