OpenWAR, RunPlusMinus: A Comparison
by J.B.Moore, Ph.D., CEO RunPlusMinus Inc.
In all sports there is an ongoing debate regarding the best player — all time, this season, at position X, etc. Baseball is no exception and many thousands of published articles and posts is a testament to the enthusiasm that researchers, columnists and fans continue to have for this debate. Unlike basketball, soccer and hockey in which players are continually involved on offense and defense, a baseball game consists of a series of discrete events with clear separations of players in offensive or defensive roles. This has led to many different statistics (Wikipedia describes over 117) that are measures of either offensive or defensive performance for teams or players. Several attempts have been made to develop a single measure of total performance that would provide a composite indicator of the overall contribution of a player derived from data of on-field performances. The challenges to develop a quality statistic are daunting. WAR (Wins Above Replacement) and RunPlusMinus™ are two approaches that claim to have achieved this objective. WAR and its successor OpenWAR have achieved significant acceptance in the baseball community. RunPlusMinus is the new kid on the block.
The purpose of this article is to compare the two methodologies to show the similarities — there are many — the significant differences, suggest strengths and weakness of each, and summarize the uses of each type of analysis. The purpose is not to suggest that RunPlusMinus is a replacement for WAR but rather that RunPlusMinus supplements and complements the value of WAR by providing insights and uses that are not present in WAR. We begin with a brief description of each. The author apologizes for any misrepresentations or wrong interpretations of information about WAR. Please send any corrections to firstname.lastname@example.org.
Brief Descriptions of OpenWAR and RunPlusMinus
The article by Benjamin S. Baumer, Shane T. Jensen, Gregory J. Matthews, “OpenWAR: An Open Source System for Evaluating Overall Player Performance in Major League Baseball” Journal of Quantitative Analysis, Vol. 11, Issue 2, June 2015 is a well-written academic paper that describes the methodology, assumptions and logic for calculating the WAR statistic. OpenWAR addressed two of the problems in the original WAR methodology, namely, a lack of uncertainty estimation and a lack of reproducibility. Software and data is available to the public, found on GitHub here for those who wish to implement a personal copy of the application. The fundamental strategy is to:
1. Calculate the total run contributions of each of the four components — batting, running, pitching and fielding — for each player.
2. Subtract the average contribution of a “replacement player”.
3. Divide the result by 10 to convert runs to wins.
This WAR value is a point estimate of the number of wins a player contributes to his team. The uncertainty of these point estimates is quantified by running many simulations of the calculations using sample values of the underlying probability distributions.
The underlying model of baseball events that is used in RunPlusMinus was conceived when the author was a Professor of Management Science at the University of Waterloo. The foundation of the model is that each half inning can be modeled as a sequence of events that cause a transition from one of the 24 states (outs & bases occupied) to another state. Furthermore, by assigning responsibilities to the offense and defense players involved in each transition, one can attribute a value to each player’s role which is some fraction of the overall value of the transition. Because what’s good for the offense is equally bad for the defense, the sum of the offensive players’ values must be equal and opposite to the sum of the defensive players’ values. The RPM statistic for each player in a given play is the difference between the player’s value and the historical average of all players fulfilling the same role in the transition’s starting state.
As with OpenWAR, transition values are measured in runs. Each RPM value may be positive or negative hence the name “RunPlusMinus”. Adding standardized RPM values of each of the four components provides a composite measure of player performance called the RPM Rating. Because RPM values are additive Team performance in a game is simply the sum of the players’ RPM values for the game. Each winning team has a positive RPM total that is equal and opposite to the negative RPM value for the losing team. The acronym “CRAZI” (Comprehensive, Run-based, Additive, Zero-sum, Independent) indicates the five characteristics deemed necessary that an ideal statistic must have to measure composite on-field performance of players. Additional description information about RPM values can be found on the RunPlusMinus website and in the Medium article “The Best Baseball Statistic”.
Similarities Between WAR and RunPlusMinus
There are many similarities in the assumptions and logic used by the two methodologies. The most significant are:
1. Modeling each half inning as a sequence of transitions from the state (0 out, bases empty) to 3 out (or game over). Associated with each of the 24 states is a Runs Expected value (RE24) representing the expected number (historical average) of runs to the end of the half inning starting in the given state. The value of a transition for the team on offense is the actual number of runs scored plus the change in Runs Expected from the starting state to the finishing state. The transition value for the team on defense is minus the offense transition value. This principle is called the “conservation of runs” property in WAR and the Zero-sum property in RunPlusMinus.
3. Each methodology splits transition values into offense (batting, running) and defense (pitching, fielding). In WAR this is done using relatively sophisticated statistical tools. In RunPlusMinus, each player involved in a transition is assigned a fraction of the responsibility of the transition value depending on his role in the play. In both cases, the player’s value is compared to historical averages of that role in the given event. For example, for a strikeout, the pitcher gets 100% of the transition value and the batter -100% of the transition value; for a fly out, the pitcher may get 70% of the value, the fielder, 30% and the batter -100%. Fair values of these responsibility percentages can be determined by solving a large linear program to insure the lack of bias for/against any of the four components. The current implementation of RunPlusMinus allows the user to specify default responsibilities for each type of event and to override the default value if a player has made an outstanding play. The player’s RPM value is the fractional amount minus the historical average resulting in a value for that player in that event that represents performance above or below average.
Differences Between WAR and RunPlusMinus
There are several key differences between WAR and RunPlusMinus.
1. Wins and Replacement Level Players
The article that defines the OpenWAR methodology contains the following pair of sentences:
“It is desirable to calibrate our comprehensive measure of performance relative to a baseline ‘replacement level’ player. However, the definition of a replacement level player remains controversial.”
Implementations of WAR other than the non-proprietary OpenWAR define replacement level in different ways. Once-defined, the WAR value — in wins — is calculated as:
(PlayerRunsAboveAverage — ReplacementPlayerRunsAboveAverage)/10
where it is assumed that 10 runs is equivalent to 1 win. RunPlusMinus on the other hand measures performance in runs. The plus-minus values are relative to the average performance of active MLB players. That is why a plot of RPM Ratings has the classic bell shape with a mean value of zero.
Clearly, wins are ultimately important but the conversion of runs to wins is somewhat arbitrary as is the definition of a replacement level player.
Notwithstanding the foregoing statements, RunPlusMinus calculates a “Won” and “Lost” value for each player in each game. A player gets a Win if his RPM value for the game is sufficiently above zero to cause his team’s RPM total to exceed the opponent’s team total and hence win the game. The converse is true. That is, if a player had been replaced by an average player, would his below average performance be sufficient to have caused his team to lose the game. In any game, there may be zero, one or more players that win or lose a game. RunPlusMinus also calculates an ‘MVP’ value for each player which is the difference between a player’s total wins and losses. The Volatility index is the sum of the absolute values of Wins and Losses and is indicates players who have hot and cold streaks as opposed to consistent performers.
2. Participation Levels
It is not clear to this writer how WAR accounts for the level of participation of players in games being analyzed. For example, a player who had only one plate appearance and hits a grand slam would have a high-performance rating and ranking if not adjusted for his low participation level. In RunPlusMinus, this is controlled for in two ways. First, a user can set cutoff values for games played, plate appearances and batters faced. These values exclude players who do not meet the thresholds. Second, raw RPM values are modified by a factor that gives more weight to players with high participation levels. For example, the batting RPM total of a player who has played 100 games has a smaller standard deviation than that of a player who has played only 10 games. This has the effect of reducing the uncertainty of rankings.
3. Pitcher and Batter Accountability for Events
In WAR, a batter is accountable for all the offensive good things or bad things that happen as a result of his plate appearance. Likewise, a pitcher is assumed responsible for the defense pluses and minuses for each batter faced. In RunPlusMinus, as described previously, responsibilities are partitioned among all players involved in each transition. Several distinguishable transitions with different player involvements may occur between one plate appearance and the next. For example, suppose a player hits a single to right field but is able to advance to second on a throwing error by the fielder. This is treated as two events — a single and a fielding error with appropriate RPM values for each of the two transitions.
4. Manager Performance
Aside from player substitutions, managers make decisions that affect the outcome of the game. Examples include intentional walks and defensive indifference. It is unfair for example to assign blame to a pitcher if ordered to do so by the manager. In RunPlusMinus, mangers get RPM values that measure their accountability for managerial decisions. These RPM values can be added to give a performance measurement of field managers. A future version of RunPlusMinus may extend this idea to decisions made by coaches such as the positive or negative results of a decision to wave a runner home.
5. Composite Player Performance
Both methodologies sum the performance values of the four components. OpenWAR totals the four runs-above-average values to get a composite value before converting that value to Wins. RunPlusMinus creates the composite rating differently. This is done for three reasons:
- The participation levels of batting, running, pitching and fielding may be very different.
- The relative importance of the offense components (batting and running) are different as are the relative importance of the defense components pitching and fielding.
- Transition values (which are the same in both systems) are biased against pitchers since pitchers have fewer offense opportunities to offset their responsibility for runs scored.
To achieve a fair measure of total performance, RunPlusMinus converts the respective totals to neutral values using standard deviations before summing them to provide an RPM Rating.
In RunPlusMinus reports, the composite value is stated in Rating units whereas the component values are reported as values of the RPM statistic which is measured in runs.
6. Inning and Score dependencies
Is timeliness an important factor when evaluating player performance? Many think so. They argue that a player’s ability to hit a grand slam in the 9th indicates greater ability than hitting a grand slam in the first inning. More generally, that “clutchability” should be measured. Numerous studies have shown however that “clutchability” falls within the range of uncertainty associated with the probabilistic nature of baseball events. WAR incorporates factors that modify performance measures based on the inning and score differential. RunPlusMinus does not.
More generally, the principle of Independence (the ‘I’ in CRAZI) is considered a strength of the RunPlusMinus methodology. More will be said about this in the summary section of this article. Park effect is another factor that can influence performance measures of batting and pitching. It is included in WAR. RunPlusMinus will use park-dependent expected run values in a forthcoming release.
7. Position Dependencies
A player’s position on defense is found in the lineup and play-by-play data captured in each game. WAR employs a sophisticated regression model to calculate the probability of a fielder making an out for a ball hit to point (x,y) on the playing surface. This information is used to derive a measure of fielding performance. RunPlusMinus simply attributes a fraction of the transition value to participating fielders that is dependent on the event involved — fly out, double play, caught stealing, fielder’s choice, etc. RunPlusMinus reporting tools permit users to compare performances of players at one or more specific positions.
8. Suggested Salaries
RunPlusMinus includes “justified” salaries in reports showing player Ratings and rankings. “Justified” means the suggested salary is based on a player’s on-field performance and does not include factors such as box office appeal, past performance, health, contract terms, etc. These salaries: 1) satisfy the minimum salary dictated by MLB and 2) are a fraction of the total salaries of players selected for the report. In a given report the salary pool is the total salary of players selected for the report. The fraction of the pool assigned to a player reflects his Rating vs the RPM Rating of all other players in the report. Because the choice of players being reported is user-dependent, a player’s justified salary may vary from one report to the next. For example, the justified salary of a pitcher as a fraction of his team’s payroll will be different from his league-wide justified salary.
The WAR documentation states that WAR values can be useful for evaluating trades but does not describe how WAR values could be used to suggest salaries.
9. Proprietary Implementation
The description of the OpenWAR model was published in a reputable academic journal and is in the public domain. Open source software is available for those who wish to implement the model. Several proprietary versions of WAR exist that have subscription fees and provide supplementary benefits to users. RunPlusMinus is also available on a subscription basis and has a user interface that can be used to customize reports and override default parameters.
Uses of WAR and RunPlusMinus
A good measure of the overall performance of players is clearly useful for making roster decisions, evaluating potential trades and negotiating salaries. In this respect both WAR and RunPlusMinus provide valuable information to those involved in these discussions. Other applications of the outputs of the two methodologies include:
1. Forecasting Game Winners
RunPlusMinus lends itself to predicting the winners of a future games. A forthcoming article will provide details and demonstrate the accuracy of the prediction logic. In the 2017 MLB season, the overall accuracy was 56%. Predictions include a probability value in a statement of the form “Team A will beat Team B with a probability of X%”. Converting WAR values to predictions is difficult because WAR values relate to wins in a season, not a specific game.
2. In-Game Decisions
RPM values can support in-game decision-making related to the choice of a relief pitcher, pinch hitter, sacrifices, etc. This results from the ability to select arbitrary subsets of game data to provide useful information. WAR on the other hand uses aggregate measures to derive WAR values and is therefore not as amenable to this use.
3. Fantasy Sports
Both WAR and RunPlusMinus statistics provide rotisserie league enthusiasts with valuable information for making roster choices. The suggested fair salaries values from RunPlusMinus provide additional information when there are salary cap constraints.
4. Player Agents
Members of Major League Baseball Player Agents Association can certainly make use of selected WAR and/or RunPlusMinus information when acting on behalf of their players.
5. User Controls
Numerous parameters are used when calculating WAR and RunPlusMinus values. WAR values can be recalculated as needed. In RunPlusMinus, the default values are re-evaluated each season and can be overridden by users if desired.
Summary: the CRAZI Attributes
The five “CRAZI” attributes provide a good framework to summarize the similarities and differences between WAR and RunPlusMinus. The Medium article “The Best Baseball Statistic” provides arguments why each of these attributes is a necessary characteristic of an ideal statistic. We consider each in turn.
Comprehensive. What level of detail is input to the analysis? Both systems capture details of every play in every MLB game. WAR captures ball location on hits; RunPlusMinus does not. WAR views a Plate Appearance as the smallest unit of play. RunPlusMinus breaks plays into sub-plays, each being a separate transition with RPM values being calculated for each player involved.
Run-based. Both methodologies calculate runs-above-average values for each event and values for each of the four components of performance. WAR converts these RAA values to wins using the concept of replacement players. RunPlusMinus combines component RPM values to produce a composite RPM Rating value that is a function of participation levels, adjusts for anti-pitcher bias and the relative importance of each component..
Additive. WAR values can be added but have little meaning for a single game. RunPlusMinus — being more granular — generates component and composite values for each player in each game. These values can be grouped in arbitrary ways to show the performance of teams and positions over user-definable subsets of dates, game types, etc.
Zero-sum. Both the WAR and RunPlusMinus calculations use the same formula for calculating the value of a change in state, namely: the Transition value = Runs Scored + change in Runs Expected. The transition value ascribed to the defense team is the negative of the transition value assigned to the offense team. In WAR this is called the conservation of runs principle. In addition, in RunPlusMinus the sum of the responsibilities of the offense players equals 100% as does the sum of the defensive players’ responsibilities.
Independent. The principle is simple — that a player’s performance measures should be independent of factors beyond the player’s control. As pointed out in Keith Law’s excellent monograph Smart Baseball classical stats such as RBIs and Saves are dependent on factors such as position in the batting order and current score. RunPlusMinus believes that any fair measure of player performance should exclude these conditions. For example, an excellent pitcher should be rated highly even with a poor won-loss record caused by a weak offense in a batter-friendly park. Or, one might claim that relievers have a more difficult task than starters because they often enter a game in high-risk situations. Keep in mind however, when using Expected Runs to value each transition, high risk means there is also an opportunity to get greater rewards. The recent use of infield shifts to counter batter tendencies is a further example. A batter has no control over whether a shift is employed or the positions of the outfielders. Analytics can be used to determine if a shift would reduce the effectiveness of a batter but that is beyond a batter’s control. RunPlusMinus eliminates these context-sensitive factors when calculating RPM values. WAR includes some factors such as the status of the game and hence WAR values are not completely independent of context.
Extensions to WAR and RunPlusMinus
The development of Statcast has advanced baseball analytics in many ways. The terabyte of data collected for each MLB game provides an incredible amount of data for investigating many aspects of player performance and reasons for that performance. Much of this data is extremely useful in understanding and improving the skills of individual players. When applied, skill improvements will show up as better performance statistics using WAR, RunPlusMinus or other tools.
Incorporating factors such as the ball & strike count or the length of leadoffs of base runners would greatly increase the state space from the widely-used 24 states. This in turn would greatly reduce the density of the state-to-state probability matrix thus adding significantly to the uncertainty of the expected runs values.
Simulations and advanced mathematical techniques such as mixed integer programming models can be used to reduce the uncertainty of point estimates and optimize batting orders, player substitutions and roster changes.
The primary purpose of WAR and RunPlusMinus is to measure performance. Both WAR and RunPlusMinus do a good job meeting this objective. While WAR has been the go-to tool used to rate and compare players, RunPlusMinus brings a new, more granular and flexible perspective to the table that can be used with alongside WAR or in place of it for the situations where WAR falls short, such as forecasting game winners, making in-game decisions, and determining appropriate salaries.
Stay tuned for our future articles and reports due out every week this season. If you want to be reminded whenever we release new content, please subscribe to our mailing list to be kept up to date!
If you have any questions, comments, requests or complaints, please feel free to add them in the comments below or to email us at email@example.com
You can learn more about the RunPlusMinus™️ statistic at RunPlusMinus.com