The DAP Journey: Bidding on BOSs

How to bid like a boss with analytics.

ZHANG, CHENGZI _

Follow

Published in

SMUBIA

4 min readMay 18, 2019

--

In this Medium series, BIA extracts the introspection of our Data Associates as they recall their academic exploration. This post features an analytics project on SMU’s BOSs Bidding, directed by Chelsea, Linus and May, supervised by Long.

Who’s the boss?

Chelsea, SIS

I joined DAP because I wanted to get hands-on experience in data analytics through projects. I also wanted to meet new friends and learn from them.

Linus, SIS

I joined DAP as I wanted to find like-minded individuals who shared the same passion. It’s a community where we can share both our experiences and insights together.

May, SIS

Having declared Business Analytics as my IS major, I believe that joining DAP as a Data Associate would allow me to learn more on Data Analytics with like-minded individuals. I especially enjoyed DAP because it has given me great knowledge and opportunities that would be difficult to find in a weekly classroom setting.

BOSS.

BOSs (Bidding Online SyStem) is a platform where all undergraduate students register or enroll for classes. Each term, the students bid for courses and workshops using the provided e$ or e-pts. This empowers them to draw up their own timetable based on personal choices and study plan, subject to the constraints of curriculum requirements, and the supply and demand of classes.

The Big Plan.

As a team, we aimed to help students suggest the next possible bid amount for each module. After all, it would be nice if students can save enough credit to accumulate enough for a dream mod.

We initially thought of analyzing past BOSs bidding results or inflation trends. We would then clean the data for Exploratory Data Analysis, build a prediction model and cross validate the model.

However, there were 2 main difficulties: the data set was limited, and there was no defining relevance between some attributes in the data set.

Specifically, we only had 5 years’ worth of data. The number of data records fell to as low as 20 when we filtered for a specific course/ window/ semester.
Attributes such as semester, section and vacancy had little to no weightage to our analytical aim. In fact, the more important factors like popularity and reputation of the instructors could not be recorded into the dataset, which made it difficult for us to form an accurate model.

Hence, we decided to visualize the data set instead of building a model.

Visualizing data on Tableau

Our team developed an interactive Tableau dashboard that visualised data and provided insights from years 2013 to 2018. Here were some of our insights:

1 This shows you which prof(s) is/are popular and allows to gauge on who is easy/difficult to bid for.

Figure 1: Comparing different profs across different years based on one term, one window and the module chosen

2 This gives you a trend of one prof over the years, whether the bids are inflating or deflating.

Figure 2: Comparing min/median/max bids of one prof across different years and different terms

3 This helps you predict the max bid amount range for the upcoming window but this may not be correct due to limited data points.

Figure 3: Prediction of max bid of one prof on particular term and window based on past years’ data

4
This allows you to visualise which profs have more vacancies to bid.

Figure 4: Pie Chart of number of students enrolled for different profs in one module based on years/terms/windows chosen

5
This will roughly tell you how many classes profs will likely to teach.

Figure 5: Number of classes profs teach based on selected module/terms/years

Analytical Tools & other powerful weapons

Some key technologies that helped us with the project were the Scikit library and SAS Enterprise tools like Miner and Guide. As shown above, we used Tableau to perform exploratory data analysis and obtain a general idea of the data (skewness of data etc).

We explored machine learning techniques such as Random Forest and Boosting, specifically AdaBoost.

Best bid amount?

We learnt that most of the dataset variables were not statistically significant, and some even caused multicollinearity problems, where the variables were not independent of each other. We had a very sparse data set as well, leading to difficulties in processing the algorithms. Overall, the dataset was especially noisy and a lot of time was spent cleaning the data.

In the end, by using the variables provided to us, we managed to get an RMSE of about 20.3–28.6, which in the context of bidding, is not particularly useful (because it is more useful if the range were smaller). We hope to further improve this value by performing feature engineering in the future.