Photo from Mike Zeisberger, NHL.com

Comparing Current NHL Superstars with NHL All-Time Greats

Christian Lee
Hockey Stats
Published in
6 min readMay 25, 2021

--

Here we use k-means and hierarchical clustering of basic career stats to compare the top NHL skaters of all time and within the 2020–21 season. Just how good is McDavid? For now, let’s just say he is in good company…

Data

For this analysis, we consider career summary data for all NHL skaters since 1917 from NHL.com/stats. There is some missing information for earlier players, like plus/minus, so some analyses are restricted to those with complete rows. To automate the process of gathering the data, I scraped the tables using RSelenium and rvest. The code is available at the bottom of the article and a step-by-step tutorials of the process is available here.

I performed some basic cleaning and preparation, like averaging the stats over games played (GP), and only keeping data for players with at least 75 GP. In the end, our main data frame contained ~4500 rows of players’ career averages.

Identifying the top 100 NHL players by points

First, to explore the data, I ranked the top 100 skaters by their points-per-game (PPG) average across their career and visualized them in the heatmap below. Quite a few active players crack the list: (in order of appearance) McDavid, Crosby, Malkin, Ovechkin, Panarin, Kucherov, Darisaitl, P. Kane, Matthews, Stamkos, Marner, MacKinnon, Pastrnak, Backstrom, Gaudreau, Eichel and Rantanen. Many of these players are just hitting their peaks now, while others are towards the tail end of their careers. It is worth mentioning this does slightly confound the analysis as we are looking at career averages and there is an expected drop-off in production over time, while some players may still have their best seasons ahead.

Minimum 75 GP

Only five defensemen crack the top 100: Orr, Cameron, Coffey, Potvin and Bourque. 50/100 skaters were/are centers, 16 left wingers and 29 right wingers. Expectedly, at the top of the list, sits Gretzky and Lemieux. In this plot, white corresponds to a value of 0.5 (half a goal or assist per game), which goes to show just how exceptional these players are. Across all players in the entire dataset, the median G/GP and A/GP are 0.13 and 0.22, respectively. The minimum value in this elite group is 0.25 (Craig Janney for G/GP).

Interestingly, we can see some players were very much goal scorers, like Lalonde and Malone, while others, like McDavid and P. Forsberg, are/were more so playmakers. Then, there were other beasts, like Lemieux and Bossy, who had both the passing and scoring touch. Continuing on this thread of finding patterns across players, next we dive into unsupervised clustering.

Clustering the top 100 NHL skaters of all-time

For now, we just focus on the same two dimensions: G/GP and A/GP. Based on these two metrics, we can ask how many types, maybe even tiers, of players there are within this elite group.

k-means clustering

Comparing the within cluster sum of squares indicates that there exist seven clusters. Below, the top 20, Auston Matthews (45), and the bottom 5 players are labeled.

Based on G/GP and A/GP, cluster 5 represents the best of the best: Gretzky, Lemieux, Orr, and McDavid. Often times we talk about how players today compare to those of previous generations. Here we see that McDavid truly is on pace to be one of the all time greats. Sitting just beyond cluster 5 are names like Crosby in cluster 7 and P. Forsberg in cluster 3.

Cluster 4 is another exceptional group composed of Dionne, Esposito and Bossy who all had above average G/GP and A/GP across the top 100. As mentioned early, Lalonde and Malone were goal scorers. In the map, they group together far along the X-axis to form cluster 1. Perhaps if Matthews can put together a few more Rocket Richard winning seasons, he may just join them.

Hierarchical agglomerative clustering

Using a bottom-up clustering approach, we can see the formation of similar clusters. As also captured by k-means, McDavid and Orr (far left) show very similar G/GP and A/GP numbers. In fact, they more closely resemble Forsberg than Gretzky and Lemieux because their G/GP averages are/were not quite as high as the latter pair.

Ward’s Distance

To reiterate, these analyses have solely focused on G/GP and A/GP. There are several other important and interesting categories that we consider next, however, not all players have the data available so we will just focus on the more recent 2020–21 regular season (as opposed to career numbers).

Clustering the top 100 skaters from the 2020–21 regular reason across multiple categories

This figure includes short-handed points (SHP), plus-minus, shots (S), goals (G), assists (A), power-play points (PPP), points (P) and even-strength points (EVP), all averaged over games played. The clustering was also applied to both the players and metrics: from the dendrogram of rows, we see that EVP/GP and P/GP are the most similar, and closely related to PPP/GP and A/GP. Interestingly, S/GP and G/GP cluster together, then with SHP/GP and plus-minus/GP.

Along the X-axis are additional position annotations: 40 centers, 27 left wingers, 22 right wingers and 11 defensemen. For the most part, there do not appear to be many large clusters by position except for one discernible defensemen cluster (middle, dark green) characterized by a high A/GP and PPP/GP, and low G/GP and EVP/GP.

The far, bottom-left cluster represents the point machines, including the likes of McDavid, Draisaitl, Panarin, MacKinnon, Rantanen and Matthews. At the top-left are the the penalty-kill threats characterized by exceptionally high SHP/GP numbers: Marchand, Bergeron, Zibanejad, E. Kane, Toffoli, Aho, O’Reilly, Buchnevich, and Necas.

Conclusion

To summarize, we used k-means clustering and hierarchical agglomerative clustering to compare the top NHL players of all time (by P/GP) using G/GP and A/GP. We found that McDavid’s numbers mirror those of NHL legends like Gretzky, Lemieux and Orr. We also took a closer look at the most recent 2020–21 season and identified sub-clusters of skaters looking across eight different categories. Most notably, the majority of top defensemen grouped together based on their high A/GP and PPP/GP, and low G/GP.

Future work can certainly expand the number of categories and number of players. Corsi, Fenwick, and other advanced stats would make for an interesting analysis and would likely incorporate more defensemen into the lists. I also simply used P/GP to rank players but there are certainly other, more nuanced metrics that can be used.

Code availability

Code for scraping the data can be found here. Code for cleaning, analyzing and visualizing the data can be found here.

--

--

Christian Lee
Hockey Stats

Medical student. Computational biologist. Sport stats enthusiast.