Image by Jill Wellington from Pixabay

Advanced Hockey Stats 101: Corsi (part 1 of 4)

Here we take a closer look Corsi and make some visualizations along the way. This also serves as a primer for future articles delving into the validity and usefulness of hockey stats to make machine learning-based predictions.

Christian Lee
Published in
5 min readDec 1, 2020

--

There are four main categories of advanced stats freely available on hockey-reference.com. Despite the “advanced” label, they are really quite simple. Today, we will be discussing Corsi.

  1. Corsi (EV)
  2. Fenwick (EV)
  3. PDO
  4. Zone starts (EV)

First, note that some of these stats only apply during even strength (EV).

Corsi (EV)

The name itself was reportedly given to honor Jim Corsi’s mustache. The formula is as follows:

Corsi = shot attempts for / shot attempts against
C = Corsi for / Corsi against = CF / CA

Notice we are discussing shot attempts (SAT), not shots on goal (SOG). Therefore, any shot directed towards the goal counts, whether it reaches the goalie or not, hits the post, is blocked, etc.. A large, positive Corsi for team A would indicate they took several more shots than team B, which, in theory, correlates with more offensive pressure and possession. Additionally, Corsi stats are often reported on a per-player basis which simply tallies the SAT while a given player is on the ice.

Corsi stats: CF, CA, CF% and CF% rel

CF and CA were shown in the above formula and represent SAT for and SAT against, respectively. Again, this includes blocks and misses. CF% is more interesting as it averages over total SAT:

CF% = CF / (CF + CA)

If a player is a major offensive threat but slacks on defense, then we can expect his CF% to hover around 50%. Importantly, players that contribute at both ends of the ice and boost their line, or that are dominant in one area and good enough in the other, can expect to see percentages >50.

Finally, CF% rel is calculated as follows:

CF% rel = CF% - CFoff%
where off = player not on ice

This is a relative measure of the team’s CF% when a given player is on ice versus when he is on the bench. CF% rel can be useful for identifying “difference makers” on a team, or someone to pair an elite goal scorer alongside.

Skater CF% from 2019–2020 NHL season

Data from hockey-reference.com

Here, we see a positive correlation between CF% and points per 60. This makes sense given that higher CF% theoretically translates to more offensive pressure and therefore, more points.

It is worth reiterating that the points category includes power play goals and assists, whereas CF% does not (in the near future, I will include another figure with EV points per 60). This is a contributing factor as to why players like McDavid and Draisaitl were near the top in points per 60, but had CF% below 50. Draisaitl scored 44 power play points, 40% of his total, but none of the shot attempts during the man advantages were considered by Corsi. When he was on the ice during even strength play, the Oilers were actually outshot by their opponents. Of course, this has a lot to do with line mates.

Interestingly, there is no overlap between the top 5 skaters by CF% and points per 60, the closest being Tomas Tatar, who ranks #2 in CF% and #15 in points per 60. This leads me to the criticism of Corsi that it is not a reliable measure of offensive threat, especially at the player level. Many of the shots could be weak scoring chances and inflate counts. Or, a given player could be underachieving but his Corsi is inflated by over-performing teammates. On the other hand, a high CF% but average point total could still indicate a given player is creating a lot of opportunities for his team, but they just aren’t finding the back of the net.

Another interesting relationship is that between CF% and plus/minus. Plus/minus is awarded at even-strength (like Corsi) or when scoring short-handed.

Here, the correlation is still significant although weaker than what we observed for CF% and points. There are several players with a high plus/minus but below a CF% of 50, and vice versa. This modest correlation adds to the criticism of Corsi since more shots for during even strength does not necessarily equate to more points for than against. Corsi does in fact measure quantity, not quality.

Now, let’s take a look at the highest and lowest CF% rel for skaters from each team.

When a player on the right side of the figure was on the ice, his team had the highest ratio of shot attempts, compared to when he was on the bench. These players likely contributed a high amount of offense, shut-down defense, an unusually high number of shots, or any number of other factors. On the other hand, players named on the left may be liabilities, stuck on bad lines, contribute to different aspects of the game, or a combination of a number of factors.

Team Corsi stats over the 2018–2019 season

Data from IcyData

Finally, let’s consider Corsi across entire teams. Here, we see that wins positively correlate with CF%. Interestingly, the range of CF% values is narrow, the biggest difference being 4.94% between the Hurricanes and Red Wings. Yet, over a season, that difference translates to hundreds of shots attempts. Interestingly, all 5 teams with the highest CF% made the playoffs, whereas the 5 teams with the lowest CF% did not. In an upcoming post, we will track Corsi and wins for a subset of teams to see how they trend from season to season.

Summary

Hockey has an element of randomness. A puck bounces off a skate and finds the back of a net, a fluke deflection off the stick of a defensemen squeezes through the five-hole. However, Corsi, a metric of shot attempts, appears to be a good indicator of team success over the course of a season.

References

--

--

Christian Lee

Medical student. Computational biologist. Sport stats enthusiast.