How Well Did Baseball-Reference’s Wins Above Replacement (WAR) Statistic Predict Wins in 2019?

David
8 min readJan 22, 2020

--

Examining the relationship between total WAR and team wins for 2019 baseball teams

Mike Trout, who is considered the best player in baseball, consistently is at or near the top of the MLB in WAR

Throughout the history of baseball, various stats have been used to determine the value of a baseball player, such as Batting Average for hitters (Hits / At Bats) or Earned Run Average (ERA) for pitchers (9 * Earned Runs Allowed / Innings Pitched). However, these basic statistics only account for a portion of the overall picture of a player’s value. For example, singles, doubles, triples, and home runs are each more valuable than the last, which isn’t taken into account in batting average. Players that draw walks at a high rate add value as well. What about a player’s defense? Base running ability? Even for pitchers, ERA fails to account for the differences between ballparks (there are parks that are more and less favorable to pitchers and hitters), and only partially accounts for the skill of the defense behind the pitcher.

So how is a team, award voter, or even a fan supposed to determine who the best players in the league are, especially with limited time to comb through footage of every single player in the league? Enter WAR, a stat that attempts to determine how much more valuable a player would be than a theoretical “replacement level” player in all aspects of baseball, or perhaps to better explain, if a player were to get injured or leave a team, how much better off would the team have been with that player than with an arbitrary readily available replacement. This is why Wins Above Replacement is used more often than other stats such as Wins Above Average, because it is much easier to find a replacement level player to substitute in than it is to find an average player.

Baseball-Reference’s Graphic Explaining Replacement Level Players

Like any metric, WAR is not without its flaws, as especially evidenced by the multiple different ways there are to calculate it. Baseball Prospectus has its calculation of WAR called WARP, Baseball-Reference has its own method called bWAR, and Fangraphs has yet another method called fWAR. While these methods are among the most widely used, there are many other theoretical ways of calculating the stat, and so clearly no one method is “correct”. bWAR was selected for the purposes of this study, but that does not make it better or worse than any other method of calculating the statistic.

Now that WAR has been introduced, how do we define mathematically what a replacement level player is? Baseball-Reference sets the replacement level at a .294 winning percentage, which is to say that a team consisting of entirely replacement level players would theoretically obtain a record approximating 48–114 (in March 2013 they made the decision to change it from a .320 winning percentage).

Great. So now that we know how a replacement player is defined, how do we calculate WAR? The main driving factor behind calculating WAR is calculating the amount of runs that a player adds (or subtracts) beyond the level of a replacement level player, which factors in both offense and defense. The overarching formula is calculated as such by Baseball-Reference:

Players Runs over Replacement = Player_runs - ReplPlayer_runs = (Player_runs - AvgPlayer_runs) + (AvgPlayer_runs - ReplPlayer_runs)

Now all this formula is doing is breaking up runs over replacement into two categories: how many more runs is a player responsible for than an average player (known as Runs Above Average), and how many more runs is an average player responsible for than a replacement level player (which does not depend on the player in question).

Baseball-Reference details how they convert runs to wins in the following article, but essentially ~10 runs is equivalent to one win:

While baseball has several different positions, the two main types of players are pitchers and position players (everyone who isn’t a pitcher). Since the value of a pitcher is defined quite differently than that of a position player, WAR is calculated differently for the two groups as well.

For position players, there are six key determining factors in calculating WAR.

  1. Batting Runs
  2. Baserunning Runs
  3. Runs added or lost due to Grounding into Double Plays in DP situations
  4. Fielding Runs
  5. Positional Adjustment Runs
  6. Replacement Level Runs

The following link offers more information on how each of these six is calculated:

While each of these requires a lot of calculations in practice, the idea behind each of the first four should be somewhat intuitive; how well a player bats, runs the bases, fields, and avoids grounding into double plays. The fifth factor, positional adjustment runs, accounts for how difficult certain positions are to play. Catcher, for example, is a much more difficult position than first base, and since the comparison for WAR is to a replacement level player of any position, not just the same position as the current player, this is an important factor to consider. These first five components make up Runs Above Average, while the last component describes the amount of runs an average player accounts for over a replacement level player (the second part of the equation above).

For pitchers on the other hand, Baseball-Reference looks at various factors, such as runs allowed, innings pitched, level of opposition, team defense, if a pitcher is a starter or a reliever, park factors, situational leverage, and again the difference between average and replacement level, and uses these factors to compute pitcher WAR. For more information on how pitcher WAR is calculated, see below:

So now hopefully the idea of bWAR has been properly illustrated. The question remains, however, how accurate it is in predicting team and player performance.

Now, since WAR is supposed to approximate actual wins, and there are 162 games in an MLB season, as a result there are 81*30 = 2,430 total wins in an MLB season, approximately .294*162*30 ≈ 1,430 of which are accounted for by replacement level players, which means there is 1,000 WAR for the entire MLB. Taking total team WAR data from Baseball-Reference, as well as expected and actual wins data from Baseball Prospectus, the following data frame can be obtained:

The Total WAR and First Order Wins categories are rounded to the nearest tenth.

Here, First Order Wins refers to expected wins using actual runs scored and runs allowed data using the pythagenpat method. For more information on how that is calculated, see below, but the basic idea is to examine the number of games that a team should in theory have won based on their run differential rather than the number of games they actually won.

As a side note, since the Detroit Tigers and the Chicago Cubs were supposed to play a cancelled game that was rescheduled for after both teams were eliminated from playoff contention, that game was never played and the total number of games each team played was 161, meaning that the total number of wins this season was actually 2,429.

After creating this data frame, regression analysis can be conducted on how accurately Total WAR represented both actual wins and expected wins in 2019.

For total wins, the R² value comes out to be ~0.88. The R² value is defined as follows:

An R² value of 0 therefore means that the model explains none of the variance in the response variable, while an R² value of 1 means that all of the variance in the response variable is explained by the model. For more on R², see the link below:

The y intercept comes out to 46.8 (which is close to the value of 0.294*162 = 47.6 wins that an all-replacement team should in theory have) and the slope comes out to 1.04 which is pretty close to 1 (which is what the slope should theoretically be since there adding 1 WAR should add 1 win).

Below is the graph of the scatterplot of Actual Wins vs. Total Team WAR, as well as the best fit line:

Other than a few outliers, it looks as though the best fit line fits the data pretty well.

For expected wins, the R² value comes out to be ~0.93, which is even higher than the previous R² value. The y intercept again comes out to 46.8, and the slope comes out to 1.03 which is even closer to 1 than before.

Below is the graph of the scatterplot of Actual Wins vs. Total Team WAR, as well as the best fit line:

Again other than a few outliers, it looks like the best fit line here fits the data even better than in the actual data. This makes sense, because it removes the variance or luck involved in how many runs are actually scored in each individual game.

To conclude, WAR did seem to accurately predict Win-Loss record in 2019 and even more accurately predict First Order or Expected Win-Loss record. While WAR is far from perfect and certainly has its flaws, it certainly is also a statistic that can be useful to determine the value of MLB players.

Github: https://github.com/DavidKatzman/nyc-mhtn-ds-010620-lectures/tree/master/WAR_Win_Prediction

Resource for Linear Regression in Python: https://realpython.com/linear-regression-in-python/#linear-regression

More from Baseball-Reference on WAR: https://www.baseball-reference.com/about/war_explained.shtml

--

--