Digging a hole: Columbus Blue Jackets performance over the course of the season

Meg Ellingwood
Kenyon College Sports Analytics
11 min readSep 4, 2020
Photo by Jamie Sabau/NHLI via Getty Images, from Jackets Cannon

As a Blue Jackets fan, it is a common experience to lament the team’s seeming to start the season on the wrong foot. Anecdotally, the Blue Jackets seem to perform worse in the beginning of the season, making it difficult for them to make the playoffs in the end even when they do gain momentum in the latter part of the season. Fans talk about the Jackets “digging a hole” for themselves in the beginning of the season that forces them to make up lost ground later, but I was curious to see if this pattern holds up statistically, or if it is just something that fans perceive but is not borne out by the data.

To answer this question, I obtained data from hockey-reference.com about every game the Blue Jackets played in the 2013–14 through 2018–19 seasons. For each game, the dataset contained the location (home vs. away), the opponent, the result, and various statistics for both the Jackets and their opponents, including goals scored, Corsi For and Against, Fenwick For and Against, and faceoff percentage.

I created new variables dividing the season in two different ways to compare performance across the season. First, I assigned games to one of three subseasons, the first being before the holiday break, second between the holiday break and the All-Star break, and third after the All-Star break. On the one hand, this division was logical, but on the other hand it was not an even division, as the middle subseason frequently had far fewer games than the other subseasons. Because of this, I also separated the season into four parts, each with 20 or 21 games. This division was even but not necessarily logical, so I conducted analysis using both kinds of divisions.

The first response variable I wanted to look at was win percentage because that was the most logical measure of team success in a period of time. I recorded the win percentage for each subseason across all the seasons in the dataset.

Looking at the side-by-side boxplots of win percentage across the subseasons, it does appear that the Jackets perform better in the later part of the season than in the early part of the season. However, there is quite a lot of variability in the early part of the season, so it would require a formal statistical test to see if this conclusion was true.

I decided to conduct an ANOVA to see if the difference between the subseasons is statistically significant. The null hypothesis for the test is that there is no effect on win percentage by subseason, while the alternative hypothesis is that at least one subseason has an effect on win percent.

H0: α1 = α2 = α3 = 0

H1: at least one αi ≠ 0

Assessing the conditions for the ANOVA, there is some discrepancy in the standard deviations between groups, and there might be some problems with normality, so the conditions for ANOVA may not be perfectly met here.

However, the conclusion of the ANOVA makes the violation of the assumptions irrelevant, as the F-value on 2 and 15 degrees of freedom is only 0.237, corresponding to a p-value of 0.792. Because the p-value is greater than the alpha-level of 0.05, we fail to reject the null hypothesis and conclude that we do not have evidence to say that the Blue Jackets have different win percentages across the three subseasons.

I also ran another ANOVA using a block design taking into account the season, but this did not change the conclusion: the p-value for the effect of subseason was 0.752, so again we fail to reject the null hypothesis.

To be thorough, I also conducted a similar analysis using the division of the season into four even quarters to see if the fact that the subseasons were not evenly balanced had an effect on the conclusions.

Once again, it looks like there might be some differences between quarters, but it will require a test to see. For the second quarter, notice the outliers that appear on this boxplot.

Again, I conducted an ANOVA to examine the effect of season quarter on win percentage. The null hypothesis is again that win percentage is the same across the quarters, while the alternative hypothesis is that at least one quarter differs from the others in win percentage.

H0: α1 = α2 = α3 = α4 = 0

H1: at least one αi ≠ 0

Assessing the conditions for the ANOVA, I found that the assumption of equal variances was met, as the standard deviations for the four groups were a lot closer together, at 0.149, 0.155, 0.109, and 0.122. None of these exceed the usual rule of thumb of having one group’s standard deviation be twice that of another group. However, the normality assumption appears to be somewhat problematic, as there is some bowing at the lower end of the normality plot.

Once again, though, the status of the assumptions does not end up making a difference, as the F-value for the effect of quarter on win percentage, with 3 and 20 degrees of freedom, is only 1.468, corresponding to a p-value of 0.253. Because the p-value is greater than the alpha level of 0.05, we fail to reject the null hypothesis and conclude that we do not have evidence to say that the Blue Jackets’ win percentage differs across the quarters of the season.

For completeness, I also ran a blocked ANOVA taking into account season, as well as the nonparametric competitors to these tests, the Kruskal-Wallis test and the Friedman test because of the outliers that appeared on the initial boxplot. However, all of these tests came to the same conclusion as the original ANOVA (blocked ANOVA: F(3,15) = 1.462, p = 0.265; Kruskal-Wallis: χ2 = 2.7729, p = 0.428; Friedman: χ2 = 3.1034, p = 0.3759). So the conclusion does not change — all signs point to the Blue Jackets not having as wide a gap in performance between different chunks of the season, no matter how I divide the season.

However, it is important to note that because of how win percentage had to be calculated, the sample size for the tests was quite small. Rather than looking at the results of individual games I had to look at the results of the whole subseason or quarter to get the win percentage, so the sample size was only the number of subseasons or quarters there were. Because of this, it is entirely possible that the power of the tests was too low to capture any true differences in win percentage across the season.

One way to address this would be to use logistic regression to look at the outcomes of individual games, but I also decided to look at more specific statistics that were measured for individual games, giving me a better look at what aspects of the Jackets’ game might be different, as well as getting around the small sample size problem created by combining games into subseasons and quarters to get win percentage.

Corsi is an advanced metric measuring shot attempts at even strength, and it is calculated by summing up shots, blocks, and misses. Corsi For (CF), or the shot attempts for the Jackets, is used as a proxy for offensive performance, while Corsi Against (CA), or attempts by the Jackets’ opponent, is used as a proxy for defensive performance.

First, I wanted to look at the impact of time across the season on offensive performance, so I created side-by-side boxplots to examine the relationship between subseason and CF.

There does appear to be a slight difference between subseasons, increasing over the course of the season, but we will have to see if it is significant. Also, we may be concerned about those outliers.

I conducted an ANOVA to see if the effect of subseason on CF is significant. Once again the null hypothesis is that there is no difference in CF across the subseasons, while the alternative hypothesis is that at least one of the subseasons differs in its CF per game.

H0: α1 = α2 = α3 = 0

H1: at least one αi ≠ 0

I first need to assess the conditions for ANOVA, and I find that the assumption of equal variance is satisfied, but the assumption of normality might be slightly problematic. Even with this problem, I decide to proceed with the ANOVA to see if the relationship was statistically significant. The F-value on 2 and 488 degrees of freedom is 5.905, corresponding to a p-value of 0.00292. Because the p-value is less than the alpha level of 0.05, we can reject the null hypothesis and conclude that there is a difference in CF across the subseasons.

After reaching this conclusion, I used follow-up multiple comparisons to determine the nature of the difference between the subseasons. I used a Tukey’s HSD procedure, which revealed that the late subseason was significantly different from the early subseason, but none of the other comparisons were significant.

So we can conclude that offensive performance as measured by CF is significantly better in the later part of the season than the early part of the season.

As before, I also ran an ANOVA blocked on season as well as a Kruskal-Wallis test, both of which led to the same conclusion as the original F-test (blocked ANOVA: F (2, 483) = 6.507, p = 0.00163; Kruskal-Wallis: χ2 = 11.09, p = 0.00397). The conclusion remains the same, that we should reject the null hypothesis and conclude that there is a difference in CF across the subseasons.

I also wanted to see if there was a difference in defensive performance across the subseasons as well, so I examined CA between the different parts of the season.

Looking at the side-by-side boxplots, the subseasons really do not appear to differ in their CA numbers, but it would require a test to verify that there is no statistically significant difference. As with the boxplots for CF, there are some outliers that might cause problems for the analysis.

To specify the hypotheses for the ANOVA F-test, the null hypothesis is that there is no effect of subseason on CA, while the alternative hypothesis is that at least one subseason differs in CA.

H0: α1 = α2 = α3 = 0

H1: at least one αi ≠ 0

Assessing the conditions for the test, the standard deviations appear to satisfy the assumption of equal variance, but just as with CF, there is some bowing in the normality plot, indicating that this assumption may be violated.

The F-value on 2 and 488 degrees of freedom was 0.159, corresponding to a p-value of 0.853. Because the p-value is greater than the alpha level of 0.05, we fail to reject the null hypothesis and conclude that we do not have evidence to say that CA differed across the subseasons.

Once again, I also run the block design ANOVA and Kruskal-Wallis procedure to see if the conclusion changes, but both these tests have the same result as the original ANOVA (blocked ANOVA: F (2, 483) = 0.162, p = 0.8501; Kruskal-Wallis: χ2 = 0.165, p = 0.9207), so we can be quite sure that we fail to reject the null hypothesis and conclude that we do not have evidence to say that there is a relationship between time in the season and defensive performance.

Though fans have observed that the Jackets seem to play differently at different times in the season, this analysis did not find much difference in their data across three chunks of the season. However, this may also be due to a lack of power resulting from small sample sizes caused by computing win percentage across the subseasons rather than taking game results individually. Future work would benefit from taking a closer look at individual game performance, perhaps by using logistic regression.

In spite of not finding any sweeping differences in overall win percentage, there was a statistically significant difference in Corsi For (CF) between subseasons, such that the Jackets got significantly higher CF scores in the late part of the season compared with the early part of the season. This suggests that the fan perception may not be too far off — that the Blue Jackets start the season with a weak offense and ramp up their offensive performance late in the season. However, it is important to consider the practical importance of a conclusion like this. Just because there was a statistically significant difference in CF between subseasons, it may not mean much for the team overall. How impactful is 3.5 extra CF points in the later part of the season compared with the early part of the season? Can three extra shot attempts make that much of a difference in outcomes for the team? It is unclear.

On the other hand, this analysis did not find any difference across the subseasons in defensive performance as measured by Corsi Against (CA). It appears that the Blue Jackets are actually pretty consistent in their defensive play, but looking back at the boxplots of CA across the subseasons raises another question, which is that the variability is different in each one, so future work could examine if the Blue Jackets are more consistent in one subseason versus another. In other words, the average CA might be similar in all three subseasons, but in one subseason they might have some games where they really shut down the opposition and others where they let a lot of opposing chances, while in another subseason they performed more in the middle the whole time. Other response variables could benefit from this kind of analysis as well, especially the win percent, as this analysis found large differences in the standard deviation of win percent across the subseasons.

Finally, it is important to look at the impact on the team’s season outcomes from these subseason performance questions. After all, it is the team’s goal (and fans’ hope) to make the playoffs at the end of the season, so it is important to assess how performance across the subseasons relates to overall performance. Future work could look at ranking the performance in each subseason and seeing if the Blue Jackets tend to do better in the end when they have a good start vs. a solid middle vs. a strong finish.

References:

Sports Reference, LLC. (2020). Columbus Blue Jackets team gamelog, 2013–14 through 2018–19 [Hockey Reference pages]. Retrieved from https://www.hockey-reference.com/teams/CBJ/2019_gamelog.html

Wikipedia. (2020). NHL Seasons, 2013–14 through 2018–19 [Wikipedia pages]. Retrieved from https://en.wikipedia.org/wiki/2018%E2%80%9319_NHL_season

Wikipedia. (2020). Corsi (statistic) [Wikipedia page]. Retrieved from https://en.wikipedia.org/wiki/Corsi_(statistic)

--

--