InfoSci Major Requirements @ UMD

Dominic Graziano
INST414: Data Science Techniques
4 min readSep 20, 2023

Data and Collection

As an InfoSci major here at the University of Maryland I wanted to look into and analyze data related to the course requirements of the degree. I found that there is an open source api called umd.io which could get the data related to the schedule of classes including courses, times, instructors, sections, days of meetings, class formats, locations, etc. This analysis could form to help make decisions in the future on whether the university decides to allocate more sections, add more seats, or even change the format of some courses to make it more balanced. All of my code is located in a Jupyter Notebook but the main libraries I used were requests, pandas, and matplotlib. To get the data I had to manually create a list which housed the course codes of the required core courses, and the courses to pursue the data science track. From this I used the requests library to call to the API in a loop, passing each of the course codes as a query in the url. From thus I could then append this to a list and was then able to load it into a dataframe which I am more comfortable analyzing.

Data Cleaning

I had three main instances of cleaning or reformatting the data, the first of which was that in the dataframe there was a nested list within the meetings column, so I applied a lambda function to break up all of the entries into their own columns. Secondly in the instructors column there were brackets left behind so they were removed across the dataframe. The final step I took in cleaning was creating two new columns to get the department and course level. The course column was used and split apart so that the first 4 digits were put into the department column, while the last three digits were taken, with the second digits being replaced with 00, to get levels of 100,200,300, and 400 respectively.

Analysis and Visualization

Most of the data I wanted to visualize related to the format of a class (whether it is in person or online). While another interest I had was to see if class sizes varied depending on the department and/or course.

The first visualization was plotted off the dataframe which was grouped by the department and class type and visualized with matplotlib. As you can see the INST department has the most amount of classes required in the major, while they also offered many online classes.

The second visualization I used broke down the dataframe to see individual stats for each classes number of seats offered by format. I did this by grouping by the course and class type while also producing a sum of seats per course, per course format. It is shown that many of the INST classes have online offerings and I found it surprising that INST377 was only online sections.

With my third visualization I wanted to break down the courses into their levels and see the distribution of neccessary courses within the major. This was done through grouping the dataframe by the level and getting the number of unique values for each of the course within the group. It is shown that there weren’t many 200 level offerings this semester and the most classes offered were within the 300 level.

With the fourth and final visualization I wanted to visualize the difference in class sizes based on the department, as well as level. I grouped the dataframe by the department and level, while getting the mean number of seats for each entry. I thought it was interesting how all of this INST levels had a higher mean class size than the required MATH, PSYC, and STAT courses.

Limitations

Overall, I think the data I pulled could have been limited due to it only being from this semester. If I had pulled multiple semesters of data we could have seen whether this was a unique semester or if it fit in line with the past. It would have been interesting to see the effect the pandemic had on course offerings within this major, or even if the department had offered as many online courses pre-pandemic. Additionally, some of the data may have been misclassified as I only had in person and online designations, while some classes may have been hybrid in format, which could have skewed some of the visualizations.

link to my repositiory: https://github.com/Domgraz/UMD-InfoSci-Course-Analysis/blob/ba3a4069f01e39bb627e04d9d54d7bde15835b38/INST414_Mod1_Assignment.ipynb

--

--