Data Analysis, UX Design, Visual Design, Sketching, Development
Note: the following is image heavy. If you’re on mobile, please watch your data limit!
An interactive visualization helping University of Calgary staff and students better understand the Computer Science degree.
A project from February-April 2016.
The challenge was to create a visualization in three stages:
- Exploratory Data Analysis: choose a dataset, pose an initial question, and analyze the data
- Visualization Prototypes: create multiple mockups, and illustrate the marks, visual encodings, layouts, and interactions
- Final Visualization: implement a working visualization, and describe the design process, implementation, and reflections
It can be difficult to plan an undergraduate Computer Science degree because of scattered information. Better visualizations can help students plan their degree, discover interesting courses, and understand class restrictions. It also helps the Computer Science Department understand courses and concentrations.
Exploring the data set
Computer Science degree composes of courses with the following information:
- Course names and descriptions
- Course pre-requisite, recommended, or anti-requisite relationships
- Whether consent is required and by which Department
- Course streams for a degree in BSc, Honours, or different concentrations
- When courses were recently offered
- Credits, lecture hours, and lab hours
Posing initial questions
To better understand the data set, we pose several questions in the context of relevant courses: Computer Science (CPSC), Software Engineering (SENG), Mathematics (MATH), Statistics (STAT) and Philosophy (PHIL).
- Q1. How are courses connected to each other? helps students understand relevant course relationships
- Q2. What relationships exist between courses and course streams? helps students understand course fields
- Q3. What trends exist between courses and when they were offered? helps estimate when courses could be offered based off past trends
Exploratory Data Analysis
To answer Q1, I created different graphs and tables. In Tableau, I created a bar chart to understand which course is fundamental. I counted the number of times a course is used as a required or optional pre-requisite.
Each node is a course with number labels, and shapes represent different faculties: circles are CPSC courses, squares are SENG courses, and triangles are other courses.
Directed edges point to pre-requisite nodes (course → pre-requisite). Optional courses are green edges and required are blue. The number of a node’s outgoing edge indicates the total number of required pre-requisite courses.
To display anti-requisite courses, I labelled courses with letters: “C” are CPSC courses, “P” are PMAT courses, and “S” are SENG courses. Directed edges point to anti-requisite nodes (course → anti-requisite) and bi-directional edges indicate only one course gives students a credit.
Another label used is “L” for PHIL courses. To display recommended courses, directed edges point to recommended nodes (course → recommended course).
Some highlights when answering, how are courses connected to each other?:
- More options exist for junior and 300-level senior courses
- A majority of required pre-requisite courses are focused on programming, algorithm design, operating systems, and software engineering
- Lower numbered courses are both pre-requisites and recommended
To answer Q2, I created a few graphs. A bar graph counts the number of required and optional courses within a stream. Labels represent the concentrations:
- SENG: Software Engineering
- Ctag: Complexity theory & algorithms
- BSc: Bachelors of Science in Computer Science
- Games: Computer Game Design
- Info Sec: Information Security
- HCI: Human Computer Interactions
- Graphics: Computer Graphics
A table was also created listing courses within each stream. The top rows show courses in multiple streams, minimizing the amount of gaps.
Some highlights when answering what relationships exist between courses and course streams?:
- The high number of BSc courses correlate to high frequency as a pre-requisite
- Many courses belong in the SENG and CTAG stream, implying the program focuses on programming, algorithms, theory, and software engineering skills
- At least one course in SENG can also be used for another course stream. Many courses also apply to SENG, Games & HCI streams
- Surprisingly, few courses are offered in the Info Sec or HCI program, which are top programs in Canada
To answer Q3, I created a table displaying when courses were offered. Top rows display courses offered in the following order:
- Fall, Spring, Summer, Winter
- Fall, Spring, Winter
- Fall, Winter
Some highlights when answering, what trends exist between relevant courses and when they were offered?:
- Courses offered in the summer are math-related, and courses offered in the spring are used as BSc requirements
- Courses rarely offered are fundamental within their field: CPSC 521: Foundations of Functional Programming, SENG 541: Fundamentals of Software Evolution & Reuse, CPSC 409: History of Computation, and CPSC 559: Intro to Distributed Algorithms
- Courses offered only twice are also specific senior courses: SENG 515: Agile Software Engineering, CPSC 550: Systems Admin, CPSC 519: Intro to Quantum Computing, CPSC 572: Fundamentals of Social Network Analysis & Data Mining, CPSC 531: Systems Modelling & Simulation
Creating the initial mockup
The visualization uses a force-directed graph, emphasizing exploration and course relationships over story-telling. To introduce the reader to the visualization, an interactive tutorial is available.
Nodes are courses (department and number), and directed links are pre-requisites (pre-requisite → course). Required courses are blue and optional are green.
Selecting circles highlight the nodes, its connected links, and a tooltip based on filters (eg. a list of course streams).
Filters activate more information, can be combined, and changes the appearance of the nodes or links.
Some filters reconfigure the graph into a scatterplot, where the x-axis is the number of lecture hours and the y-axis is the course level (x00). Nodes are clustered with no overlap and highlighting them shrinks other areas not of interest.
Marks identified are points, representing nodes & filter buttons; and lines, representing links between nodes.
A force-directed graph is used to better identify relationships and connections between courses. For an uninterrupted reading experience, filters are placed away from the graph.
To reduce conflicting links and provide logical hierarchy, pre-requisite courses are parents and children are positioned based on their in-degree. The layout makes it easier to compare filter types without needing to remember previous states. Combining filters allows readers to explore and gain insight into the Computer Science degree.
The direction of nodes indicate that it is a pre-requisite link. A → B means “A is a prerequisite of B”. Required courses are blue and optional are green. A node’s in-degree is the number of pre-requisites, and its out-degree is the number of times used as a pre-requisite. The default information when selecting a node displays the course name.
Adding filters and additional tooltips
The course streams filter helps students better understand concentrations. Selecting one course stream fills a node with a shade of grey. Multiple streams add value to the node, which are discernible because courses do it belong to that many streams. Selecting a node displays the course name and course stream.
The course availability filter lets students estimate when courses could be offered based on previous data. Combined with pre-requisite links, students can plan their degree. Selecting a node displays the course name and when courses were offered.
The anti-requisites filter lets students know which courses overlap in content, only one of two courses are counted for credits. A→B means “A is an anti-requisite of B”. Required courses are red, and optional are orange.
The recommendations filter lets students know which courses help prepare for a course. A→B means “A is a recommendation for B”. Required courses are blue, and optional are green with a dashed stroke type.
The consent required filter lets students know which course has additional requirements and adds a thick border around the node. Selecting it displays the course name and consent information.
The credits, lecture hours & lab hours filters are connected and relate to the course structure. Readers select a numbered button, highlighting corresponding node with distinct rings of different hues and saturation. At most three rings are stacked and distinct from other filters. Selecting a node displays the course name and course level (junior, senior, or not included in GPA).
To see different relationships, many encodings were used. Color hue is best used for categorical data like whether a course is required or optional, and the distinct categories within credits, lecture & lab hours. Color saturation captures values and quantities like the number of streams a course belongs to. Stroke width makes nodes more noticeable for binary categories like whether a course requires consent or if it was available during a semester.
Differences from the mockup
Initially, the tree was initially organized based on pre-requisites and minimizing link collision, but readers expect course seniority.
To prevent information overload, I removed the opacity options for nodes and links and scatter plot. The availability filter now contains a range and changes a node’s stroke type.
Improving the code
I used Wrangler to create data tables, exported to JSON, and transformed it to a D3 input format with regular expressions. Using a force-directed graph, I created x & y focus points for the node’s attraction.
Filters change a main node’s circle style. Iterating the data checks if the selected circle’s ID matches the data’s ID and return different styles. To handle multiple filters I utilized HTML5’s data attributes. Keeping track of when a filter button was clicked easily changes a node’s style.
Nodes also have data attributes. A counter was used to keep track of course stream states and change a node’s fill color based on the number of buttons clicked. I kept track of changes to the node style and associated links. Whether the node was hovered or clicked, it could return back to its original style until it was interacted again.
In addition to the data attributes, I handled a lot of edge cases with the stacked filters. Designing an organized interaction allowed me to:
- Add tooltip content corresponding to filters
- Add error handling and automatic style updates for the Availability dropdown filters
- Select multiple buttons within the Course Stream filter
- Select only one button within the Credits filter, Lecture Hours filter, and Lab Hours filter
- Code base (explaining how the data was created)
- Live visualization (optimal on a desktop & Google Chrome)
The visualization was unique and has been shared to many teachers and students. My visualization teacher and other Teaching Assistants have shared it with others.
The visual encodings were well thought out and accommodated multiple encodings, but requires a learning curve, especially the color coding with credits, lecture & lab hours.
There is a lot of overlap with links and changing to a tree layout may better help with link collision. The focus on pre-requisites does not allow room for inspecting specific courses. Additional flexibility like hiding pre-requisite links or removing irrelevant courses may be of interest.
Software: Sublime Text, Trifacta Wrangler
Have any suggestions or constructive criticism? Please comment below! I am always looking for ways to do better.
If you liked what you read or learned something new, support with 👏, spread the 💙 or follow me!