Photo by Chris Liverani on Unsplash

Advanced Hockey Stats 101: PDO (Part 3 of 4)

Here we take a closer look at PDO and make some visualizations along the way. This also serves as a primer for future articles delving into the validity and usefulness of hockey stats to make machine learning-based predictions.

Christian Lee
Published in
5 min readJan 1, 2021

--

Agenda

  1. PDO definition
  2. PDO theory
  3. Team PDO over the 2018–2019 season

Data

2018–2019 advanced stat data was scraped from NHL.com/stats. In this case, I had to use a CSS selector as opposed to the xpath method used in this tutorial.

df = read_html(url) %>%
xml_nodes(css="div[role='gridcell']") %>%
xml_text()

We also scrape the table column names instead of manually typing them.

header = read_html(src) %>% 
xml_nodes(css="div[role='columnheader']") %>%
xml_text()

PDO definition

PDO (SPSV% aka On-ice S%+Sv%) is the sum of shooting percent (S%) and save percent (Sv%) during even strength play. Often times, but not in our dataset, it is multiplied by 10 for the sake of using whole numbers.

PDO = S% + Sv%, where

S% = goal / shots on goal and Sv% = saves / shots on goal

For an individual player, PDO measures what happens when a given player is on the ice. For a team, PDO measures the total game counts.

What does PDO really measure?

A high PDO for team means a high proportion of shots for are going in, and/or a small fraction of shots against are going in. Therefore, the higher the PDO the better.

However, when talking about PDO, we often talk about regression to mean, meaning over time, a high PDO will fall to the mean (100), and a low PDO will rise to the mean. The reason the mean is 100 is because a shot on goal is either a goal or a save. Therefore, a total game’s S% and Sv% will always add up to 100. Likewise, summing over the entire season will always add up to 100.

For example, during game 6 of the 2020 Stanley Cup Finals, the Lightning had 24 shots on goal and 16 saves, while the Stars had 16 shots and 22 saves. The final score was 2–0 for the Lightning.

SP total

24 shots + 16 shots = 40 shots

S% total = 2 goals / 40 shots = 5% (of shots resulted in goals)

SV total

16 saves + 22 saves = 38 saves

SV total = 38 saves / 40 shots = 95% (of shots were saved)

S% total + Sv% total = 100%

PDO and regression to the mean

Regression to the mean ultimately says the difference between players and teams shrinks. While it might seem that some teams and/or players are going strong and will maintain their dominant performances, statistically, they tend to normalize over time.

This does not mean that every player and every team ends the season with a PDO of 100, only that the variation is reduced. For a NHL player to sustain a very high PDO, he and his linemates would need to remain extremely efficient in goal scoring and be consistently supported by lights-out goaltending. To have one dominant line over the course of a season is already a challenge, but to also have dominant goaltending is an anomaly.

A special note for fans and fantasy owners

PDO and regression to the mean is something to remember when your players have a slow start or even a slow month. Generally, their numbers will improve as pucks might start bouncing their way or for their teammates. Let this also be a warning to unsuspecting owners to avoid buying (or a tip to ruthless owners to sell) players high because their performance, more likely than not, will come back to earth.

As a result, PDO is normally distributed, as seen below. There will always be players at the tail ends but the bulk of skaters fall close to the mean.

Team PDO over the 2018–2019 season

Here, to visualize PDO over the course of a season, we are only going to consider team moving averages since the player game data is rather sparse.

Below, I colored the top and bottom three teams based on PDO after 10 games played. Additionally, the Lightning are shown in pink as they dominated the regular season with a final record of 62–16–4. Vasilevskiy won the Vezina and Kucherov was the points (128) and assists (87) leader, earning him the Hart Trophy. They ended the season with a PDO of 101.9, third highest in the league. Number one was claimed by the Islanders with a PDO of 102.2.

Overall, the trend is exactly as we described. There is a lot of variation in the beginning, but over time, team PDO’s regress towards the mean. For example, the Ducks started out well but quickly fell to ~100, and the Golden Knights did the opposite. The Lightning show a different pattern and actually show what appears to be a steady or even slight incline over 100 as the season went on.

Conclusion

PDO tells us that teams and players may have ups and downs, but over time, variation shrinks as their numbers regress towards the mean. NHL skaters and goalies are the best in the game. The differences in skill, and its distribution across teams, makes puck luck a non-negligible factor and keeps hockey exciting. A few right or wrong bounces can drastically change the direction of a game. With that said, there are a few players who defy the odds. Hopefully they are on your team.

What we haven’t explored here is how game PDO relates to wins or points. Intuitively, we would expect the team with the higher PDO to win the game, however, the stats could be inflated by a very high Sv% bolstered by a high number of shots against, or a low number of goals over by a small number of shots for. Consider an extreme scenario where team A scores 1 goal on 10 shots and team B scores 2 goals on 50 shots:

PDO team A: 1/10 + 48/50 = 106%

PDO team B: 2/50 + 9/10 = 94%

Based on these numbers, team B dominated the game and won, but their PDO was 12% lower than team A. Therefore, PDO is heavily influenced by the number of shots, goaltending and luck.

While PDO is supposed to be a measure of how well a player or team is performing, if it doesn’t correlate well with other metrics, like points, the regression to the mean pattern might not be that important or informative.

--

--

Christian Lee

Medical student. Computational biologist. Sport stats enthusiast.