Analysis of the 2022 ACSL Senior Scoring Distribution with Python

4 min readMay 30, 2022

Introduction

Like many people in the United States and some around the world, I recently participated in the Finals round of the American Computer Science League (ACSL) competition. I participated in the Senior division.

After doing the competition, I was curious on the scoring distribution of the Senior division. For context, the competition consists of two sections, a programming section with 2 problems worth ten points each, and a multiple choice section worth 20 points.

When I finished the programming competition, I felt that the second problem in the programming section was quite difficult compared to the first one, and wondered how many people were actually able to finish it. So, I decided to analyze the data.

I did this by using the Python libraries BeautifulSoup and Pandas, to grab the data from the Internet and to analyze it, respectively.

Lucky for me, all of the scoring data was publicly available at https://www.scores.acsl.org.

I’m now going to walk through the code I used, and you can check it out yourself on my GitHub: https://github.com/BlobKnight/ACSLDataAnalyzation

The Code

I started off by importing the libraries I used and setting up the request headers.

Here, I’m parsing through both Senior division URLs and grabbing each team sub-link within them.

From there, I created the Pandas data-frame, and went through each sub-link, grabbing the non-empty data from the ‘Finals Prog’ and ‘Finals Shorts’ columns for each team. I also created my own ‘Total’ column, by adding the Shorts and Programming scores together.

Now that I had the data saved, I exported it to a CSV file to permanently store it. After that, plotted histograms for the Programming section, Shorts section, and the Total scores for all Senior Finals participants. Finally, I printed the Medians for each column.

Now that I obtained and plotted the data, it was time for analyzing.

Data Analysis

NOTICE: I am ignoring all scores of 0s for both sections, as it significantly skewed the data and I don’t believe that it is worth accounting.

Score Distribution of the Senior Programming Section

This is what my program outputted for the histogram of the Programming Section results. This supported my suspicion that the many people would find the second problem harder than the first. The frequency significantly drops off after the ten point mark, as any participant with above ten points would have had to have a successful result on the second program, since both are out of ten.

The median for this section was eight. It is worth noting at this point that each problem is scored by the results of ten test inputs to the participant-submitted program. This means that the median amount of passed test inputs was 8 out of 20.

Next, I looked at the results for the Shorts section.

Score Distribution of the Senior Shorts Section

This data is not surprising to me, as the Shorts section felt on par compared to previous years. The median score for this section was 14. In context, this means that the median number of correct answers given by participants was 14 out of 20.

After examining the data for the senior division, I wondered whether participants in lower divisions also faced similar challenges. So, I modified my code to plot histograms of the Intermediate and Junior division sections.

Comparison of Score Distributions between ACSL Divisions

Here are the generated histograms placed next to each other.

Its clear that in the programming section, participants at the Intermediate and Junior levels had a much easier time completing their second program, due to the peak at 20 being much higher compared to the Senior graph. This suggests that it was just the Senior division that had the very difficult problem.

As for the Shorts, the distribution is not very different between the divisions. However, the Intermediate division had the highest median score of 16 (Senior median was 14 and Junior median is 15).

Closing Thoughts

This concluded my analysis of the ACSL score distributions. I learned a lot undertaking this project, such as how to parse tables with BeautifulSoup and how to display data with Pandas.

I now understand why Python is so popular for Data Science, as I don’t think it would have been an easier job in any other language.

Analysis of the 2022 ACSL Senior Scoring Distribution with Python

Introduction

The Code

Data Analysis

Closing Thoughts

Written by Shubham Bhatnagar