The Hockey Marcels, Version 2.0

DatsyukToZetterberg
10 min readJul 2, 2022

Hello everyone, today I am releasing an updated version of the Hockey Marcels I posted 2 years ago. The “Hockey Marcels” are inspired by the baseball version created by Tangotiger. His original baseball version was designed to be the simplest and most basic forecasting system. It uses a weighted average, where most recent seasons are weighted heavier, and then regresses that weighted average towards the league average. The Marcels were highly effective at creating an accurate forecasting system, so much so that they can act as a barometer for determining how effective more complex systems actually are.

In this post, I will outline the changes that I made to create this new Marcel model, an example of how the model works, and the model’s errors overall. Should you wish to skip to the conclusion section, you can find a link to my github with the code for the projections and a link to a google doc sheet with all of the projections.

Changes To The Marcel Model

When I created my version of the Hockey Marcels 2 years ago, I made a number of incorrect assumptions that this updated version corrects. The most significant changes are that this version comes with games played, henceforth “GP”, projections and the weights for each season have been slightly changed.

The previous version operated under the assumption that all players would play all 82 games; even though I knew that wasn’t possible, I wasn’t sure how to properly implement a forecasting method. Tangotiger had a blog post where he discussed the method of binning players, and that’s when I realized that such a method would be perfect for the Marcels. While Tango suggests capping players at 74 or 76 games, I ultimately settled on a method that caps GP at 79 games. The general idea of this GP forecasting method is to try and separate the regulars who had an unfortunate injury from the regulars who are injury-prone from those that are call-ups or depth pieces.

Each player is sorted into 1 of 16 potential bins, and each bin has a slightly different formula for projecting GP. These bins share a similar formula structure, what changes is the amount of regression to the mean that occurs or what number anchors the regression to the mean. Regulars that have played enough in the 2 most recent seasons are regressed towards 76 GP, regulars who have missed time are regressed towards 74 GP, and depth players are regressed towards 40 GP. While this method is a bit cumbersome and not as clean as the other parts of the Marcels, I believe it’s still within the originals’ spirit. It doesn’t use a formal statistical analysis; the only real technique is the binning of players. It still uses the regression to the mean component, which is a core piece of the Marcels.

There are some downsides to this method. Due to how I’ve implemented the “binning”, there are essentially 2 different distributions. One for those classified as “regulars” and one for those classified as “depth”. Ideally, we would have a smooth transition between the two, but it isn’t possible due to the way the binning works.

Density plot of GP in the 2022 season. Min 15 GP

To clearly show the effect, I’ve included a density plot for all players that played more than 15 games in the 2022 season. On top, we have the projected GP; on the bottom is the actual GP. As you can see, the projected GP has a grouping from around 25 to 45 and another from 55 to 79. The actual GP is constant from about 15 to 50 and then begins to grow constantly until it peaks near 80. This discrepancy between the two makes sense; it’s quite difficult to project long-term injuries. There is very little repeatability in long-term injuries and most seem to occur randomly. Given how random these long-term injuries are, I’m okay with most “regular” type players having projections to play in the 60–79 game range.

The next change involved the weights used to create the weighted averages. The Baseball Marcels use a 5–4–3 weighting system where 5 is the weight for the most recent season, 4 for the second most recent, and 3 for the least recent. Through creating my own forecasting system, I’ve found that most stats in hockey are more heavily weighted in the most recent season than the 5–4–3 weights would allow. For that reason, I used 6–3–1 weights which are more in line with what I’ve found in my research.

This weighting system means that players with just 1 season of data or less playing time overall will be more heavily regressed towards the mean of those with 3 full seasons’ worth of data.

Lastly, there is a change to how TOI is projected. The previous version had just two methods for projecting TOI, one for Forwards and one for defenceman.

F Proj TOI = 0.5*N-1 + 0.2*N-2 + 5

D Proj TOI = 0.5*N-1 + 0.2*N-2+ 6

If a player only played in the most recent season, their TOI projection would be substantially lower than it should have been. Under this old formula, a player like Moritz Seider would have been projected to play just 17.50 minutes a night. That’s 5.50 minutes less than he played last year. To account for this, I used another binning method where players were sorted into 1 of 6 bins based on their position & which seasons they played in.

The new method has less regression to the mean than the previous version had. NHL players have very little change in their TOI, and you can base almost all of the projected toi off of the previous year’s number. Players with less than 2 years of data will have more regression toward the mean than those with a full 2 years. For example, if a player had played in the N-1 and N-2 seasons their toi projection would be

F Proj TOI = 0.8*N-1 + 0.1*N-2 + 1.5

D Proj TOI = 0.8*N-1 + 0.1*N-2+ 2

The new binning projections work better for all players but are much better for those with only have 1 year of data. Under this new method, Seider would be projected to play 22.40 minutes which is more

The Projection Process

Let’s assume that we’re projecting the 2023 NHL season. Our N-1 season is 2022 which carries a weight of 6, N-2 is 2021 and carries a weight of 3, and N-3 is 2020 and carries a weight of 1. We can get our weighted stats from the following formula:

Weighted Stat N-1= 6*Stat N-1

Weighted Stat N-2 = 3*Stat N-2

Weighted Stat N-3 = 1*Stat N-3

If a player didn’t play in a season, their weighted stat for that year would be 0. We can find our overall weighted stat by just summing up all of the weighted stats:

Total Weighted Stat =

Weighted Stat N-1 + Weighted Stat N-2 +Weighted Stat N-3

Next, we find the league average for both forwards and defenceman for those seasons. We split up forwards and defenceman because they generally have different league averages and it should produce more accurate forecasts. Each player will have their projections regressed towards this league average. The more a player has played, the less impactful this regression will be. For a forward this would be their league average:

F League Avg Stat N-1 = mean(Forwards only, stat N-1)

F League Avg Stat N-2 = mean(Forwards only, stat N-2)

F League Avg Stat N-3 = mean(Forwards only, stat N-3)

Once again, if the player did not play in the season, that years league average would be 0 for said player. Similarly, the total weight league average stat would be:

Total Weighted League Avg =

F League Avg Stat N-1+F League Avg StatN-2+F League Avg Stat N-3

With the weights found, we need to determine the projected TOI & projected GP. Here we’ll assume that the player has played in the N-1 season and N-2 season. Their TOI is calculated as follows:

F: 0.8 * N-1 TOI + 0.1 * N-2 TOI + 1.5

D: 0.8 * N-1 TOI + 0.21* N-2 TOI + 2

These are just 2 of the potential TOI cases. In total there are 6 different bins a player can fall into for TOI projections.

GP are calculated using the following formula:

76*0.50 + 0.25*N-1 GP + 0.25*N-2 GP

This is just 1 of the 16 potential bins for the GP projections. In this case all players in this bin will be heavily regressed towards the 76 GP.

The last individual component needed is the regression to the mean weights. This number will regress a player’s weighted stats towards the league average. If it’s a forward, we regress their weighted stats to 1250; if they’re a defenceman, we use 1750.

Regression to Mean Weight = 1250 if forward or 1750 if defenceman

We can now combine each of these components to create a non-age-adjusted projection. To complete the league average stat, we divide the weighted league average stat by the weighted league average TOI and multiply that by 1250 if a F or 1750 if D, our Regression to Mean Weight.

reg lg stat = (lg average stat/lg average toi)*(1250 or 1750)

We add the regressed league stat and the total weighted stat together and divide that value by the total weight toi + regression to mean weight value. Multiplying this value by the proj GP and proj TOI gives us the non-age-adjusted projected stat:

Non Age Adj Stat =

[(reg lg stat + tot weighted stat)/(1250 or 1750+ tot weighted toi)]*GP*TOI

Lastly, we have the age adjustment, which is a simple linear model. It assumes that players continue improving at a constant rate until they’re 28 and begin to decline at a constant rate starting at 29. It is assumed that players decline slower than they grow. For players younger than 28, we use the following formula

Projected Stat = [(28 — age)*0.008+1]*Non Age Adj Stat

And for players older than 28 we use

Projected Stat = [(28 — age)*0.004+1]*Non Age Adj Stat

With the age adjustment complete, the projections are now complete. There is such little year-over-year change in average stats that I don’t believe it’s 100% necessary to rebaseline the results.

For a concrete example let’s create Auston Matthews goal projection for the 2023 season.

We begin with that stats we will need for projecting Auston Matthews 2023 goal total:

2022: Goals = 60, GP = 73, TOI = 1504, TOI.GP = 20.60

2021: Goals = 41, GP = 52, TOI = 1121, TOI.GP = 21.56

2020: Goals = 47, GP = 70, TOI = 1468, TOI.GP = 20.97

His weighted totals are then:

Weighted Goals = 360 + 123 + 47 = 530

Weighted TOI = 9023 + 3363 + 1468 = 13854

The league average totals (F) are:

2022: Goals = 14.50, TOI = 965, TOI.GP = 14.80

2021: Goals = 9.50, TOI = 680, TOI.GP = 14.93

2020: Goals = 12.20, TOI = 858, TOI.GP = 14.77

The league average weighted totals are:

Weighted League Avg Goals = 87 + 28.5 + 12.2 = 128

Weighted League Avg TOI = 5790 + 2040 +1468 = 9298

And his projected GP & TOI are:

Proj TOI = 0.8*20.60 + 0.1 * 21.56 + 1.5 = 20.14

Proj GP = 76*0.50 + 0.25*73 + 0.25*[(52/56)*82] = 75

The non age adjusted projection for Matthews’ goals is:

[(530+(128/9298)*1250)/(1250+13854)]*20.14*75 = 55

We should note that Matthews’ goals are being regressed towards league average by 8.3%. This is because Matthews’ has played quite a bit over the past 3 seasons. 1250/(1250+13854) ≈ 0.083.

Next we calculate the age adjustment:

Proj Goals: 55*[(28–25)*0.008+1] = 56

Proj TOI: 20.14*[(28–25)*0.008+1] = 20.62

Thus, according to the Marcels we can expect Matthews to score 56 goals in 75 games played while averaging 20.62 minutes per game.

Error Checking

Any model must be evaluated to determine how it performed. To check the model’s adequacy, I’ve decided to use the Mean Absolute Error, MAE, and Coefficient of Determination, R². MAE was selected over an alternative like RSME because the goal is to measure how close the projections are to the actual results. In my opinion there’s no reason to penalize larger errors more than smaller ones.

Each season that had a full 82 game schedule was used and any player must have played in more than 15 GP to be included. First the MAE:

MAE for all 82 GP seasons

The MAE for most categories fall within a relatively stable range. There are some outliers, such as points or PIM, but this is due to how the NHL game has changed over time. One thing to note is that FOW will likely be higher than what is shown as defenceman will almost always have a MAE of around 0.

Next are the R² values for each season:

R² for all 82 GP seasons

The R² values were also in a relatively stable range, the exceptions being GP and PIM. This is not unexpected, as these are the two more volatile stats in the data set. I should also note that while the GP R² number may look underwhelming; however, in my previous research, a MLR approach produces an R² of about 0.05.

Conclusion

Overall, I believe that the newer version of the “Hockey Marcels” does an excellent job as a base-level model. They’re simple, efficient, and straightforward to create. A more tuned model should be able to beat it; if that model can’t, it indicates that some tweaks are needed.

Over the past 2 years, I’ve become much more comfortable with R, and I believe that my code is now at a level I feel comfortable sharing. I’ve put the code I used to create these projections here on my github. The program is built using R; having R downloaded and an IDE, such as RStudio, will be a prerequisite. To produce a projection, you just need to load the libraries, the function “proj_season”, and type in the year you wish to project. I did my best to comment the code, but I’ve never shared a project like this before, so I hope you can follow the logic behind some of my decisions.

If you’re just interested in the Marcel projections you can check out this google doc which has all projections from 2011 to 2023. If you want to edit the sheet just click the “make a copy” tab under the file heading.

If something isn’t clear, don’t hesitate to ask, and I’ll do my best to explain it. If you want to tinker with the weights or create your own forecasting model, please feel free to use any of the code used in the Marcels as the base.

--

--