More on WAR

For a few years now, Bill James has had a problem with WAR. He has mostly stayed quiet on this because, well, he knows that he’s Bill James. He remembers how the people who held the power in baseball punched down hard on him as a young analyst. He has some power now, being a legend and one of Time Magazine’s most influential people and the Godfather of Moneyball and a three-time World Series winner with the Boston Red Sox. He does not want to punch down hard at the young analysts today. He absolutely wants to encourage people to advance baseball thought.

But, like I say, he has a real problem with WAR. And Thursday night, armed with strong feelings about the Jose Altuve-Aaron Judge MVP race, Bill let it rip.

Now, that article is perfectly accessible — the most underrated part of Bill is that he really is a wonderful writer — so there’s no need for me to explain it here. But I want to make a couple of points about it, points that have been bothering me for a long time, and so I will explain what I see as Bill’s biggest beef with WAR and then get to my own thing.

When Bill and I have discussed “Wins Above Replacement” the last few years, Bill has made clear that the problem he has with WAR is that it is not nearly as complex or elegant a statistic as he had assumed. He figured that because we have so much more data to work with today and the new analysts are so much more proficient at working with that data, that the new systems would be mind-blowing in their depth and breadth.

I’ll include this quote from my story Vanguard After the Revolution.

“My math skills are limited and my data-processing skills are essentially nonexistent. The younger guys are way, way beyond me in those areas. I’m fine with that, and I don’t struggle against it, and I hope that I don’t deny them credit for what they can do that I can’t.

“But because that is true, I ASSUMED that these were complex, nuanced, sophisticated systems. I never really looked; I just assumed that the details were out of my depth. But sometime in the last year I was doing some research that relied on these WAR systems, so I took a look at them, and … they’re not very impressive. They’re not well thought through; they haven’t made a convincing effort to address many of the inherent difficulties that the undertaking presents. They tend to get so far into the data, throw up their arms and make a wild guess. I don’t know if I’m going to get the time to do better of it, or if it will be left to others, but … we’re not at anything like an end point here. I assumed that these systems were a lot better than they actually are.”

There was some backlash when Bill said that five or so years ago, but even after the backlash Bill still wasn’t ready to go into detail. He now has and his big complaint — I hope I’m summing this up effectively — is that WAR does not connect directly to wins. The name “Wins Above Replacements” suggests that it is attached to wins but it is, in fact, attached to RUNS. The wins part is an afterthought.

Many of us have known that WAR’s connection to actual wins is tenuous, but we never thought much about it. And now that Bill brings it up … yeah, it’s actually kind of jarring.

Look: Baseball Reference WAR and Fangraphs WAR go to great care figuring out how many runs a player is worth. They calculate (in different ways) what a positional player’s value is as a hitter, as a base runner, as a fielder. They make a positional adjustment because, as mentioned here a couple of days ago, some positions are more valuable than others. They make ballpark adjustments. They make a league-wide adjustment, based on the run-scoring atmosphere of the league (1968 being very different from 1999, for example). Pitchers have their value translated to runs; Fangraphs and Baseball Reference take very different routes to the same goal of separating a pitcher from his defense. Then, yes, they again adjust for ballparks and the run-scoring atmosphere of the season.

This all takes a great deal of calculation and thought and bold viewpoints. WAR is a wonderful formula in so many ways. And when the calculations are done, we are left with a number of runs a player/pitcher is worth, a number that can then be compared with the run value of a replacement player.

And after all this very intense math, how do they get from RAR (Runs Above Replacement) to WAR (Wins Above Replacement)?

They basically just divide the total by 10.

Yep, that’s pretty much it. Well, it is a bit more complicated than that — “You simply take that sum and divide it by the runs per win value of that season to find WAR,” Fangraphs explains — but really, yeah, you mainly just divide by 10.

Aaron Judge (Fangraphs): 82.9 Runs Above Replacement, 8.2 WAR.

Jose Altuve (Fangraphs): 75.4 Runs Above Replacement, 7.5 WAR.

Joey Votto (Baseball Reference): 77 Runs Above Replacement, 7.5 WAR

Giancarlo Stanton (Baseball Reference): 78 Runs Above Replacement, 7.6 WAR

I think this is what Bill meant when he said, “They tend to get so far into the data, throw up their arms and make a wild guess.” Both WAR systems work so hard to determine how many RUNS a player is worth. And then, after that, the work is pretty well done. “If you had to pick one number over the history of baseball to convert runs into wins,” Baseball Reference writes, “it would be 10.”

What’s wrong with just dividing the runs by 10? Isn’t 10 runs about what a win is worth? Yes, I believe it is in a very general way. But this gets me to something that has frustrated me for years now but I’ve never had the words to explain my gripe. Let’s see if we can find the words here.

Let’s begin by using Baseball Reference to compare the Houston Astros and New York Yankees..

The Houston Astros players, added together, are worth 53.2 wins above replacement. The position players are worth 39.8 WAR; the pitchers are worth 13.4 WAR. The Astros won 101 games in 2017, so this suggests a team of replacement players would win 48 games — 101 minus 53. That’s reasonable.

The New York Yankees players, added together, are worth, hey, what do you know, 53.2 wins above replacement. Amazing! The Yankees’ split is different though: 29.5 WAR for position players, 23.7 WAR for pitchers, but it adds up to the exactly the same WAR as the Astros.

But the Yankees won only 91 games in 2017. So again, doing the math, 91 minus 53, huh, the Yankees replacement team only wins 38 games. This is not reasonable. Why are the Yankees replacement players so much worse than the Astros replacement players?*

*If you want to do something similar with Fangraphs, you can look at the Yankees and Diamondbacks. The Yankees won 91 games and were 43 wins above replacement, meaning a replacement team would win 48 games. Arizona won 93 games but were just 34 wins above replacement, meaning their replacement team would win 59 games.

The answer as Bill explains is that WAR does not have anything to do with actual wins. It is about runs. The Yankees’ expected record, their Pythagorean record, based on how many runs they scored and allowed, is 100–62. The Astros expected record, based on how many runs they scored and allowed, is 99–63. By runs, they were the same team. And so they have the same WAR.

But they were NOT the same team. Why don’t the Astros players have more WAR when they so clearly won more games?

This gets to the heart of my longstanding uneasiness with some of the advanced statistical thinking: I sometimes have wondered if maybe we’re so busy adjusting some stuff and dismissing other stuff as luck that we might be straying too far from what’s actually happening on the field. If we can adjust for the fact that Yankee Stadium was a great hitters park and Minute Maid was a great pitchers park, how can we not adjust for the fact that the Astros won 10 more games than the Yankees? How can we not find those 10 wins in our analysis?

A few years ago, Bill James came up with his Win Shares system, and a lot of people didn’t like it for various reasons. I don’t like ever quoting Wikipedia, but in this case I think they do a nice job expressing one of the bigger complaints about Win Shares:

“One criticism of this metric is that players who play for teams that win more games than expected, based on the Pythagorean expectation, receive more win shares than players whose team wins fewer games than expected. Since a team exceeding or falling short of its Pythagorean expectation is generally acknowledged as chance, some believe that credit should not be assigned purely based on team wins.”

There it is: Is a team winning or losing more games than expectation “chance?” I’ve always thought that’s mostly true, but I will just say: It’s a copout to just stop there. The object of baseball is to win games. Scoring runs, preventing runs, that’s all well and good. But the object is to win. Are we really ready to concede here, ready to just throw away X number of wins every year without a fight?

And even if we believe that the fight is over, even we believe that those extra wins are chance — how can we not include chance in our stats? Look, in the end EVERYTHING IN SPORTS AND LIFE has some chance involved. We would love to adjust chance out of our baseball stats, but at some point we are altering what really happened. Maybe the Yankees “should” have won 100 games. But they did not. And to give Aaron Judge 8.2/7.2 WAR on the assumption that they did isn’t good enough. We have to be better than this.

This is especially true in this specific situation because Judge was not the same player in high leverage situations as he was the rest of the time. Again, maybe that’s chance, but it’s reality. He hit .215/.380/.380 in late and close situations — the exact situation where the Yankees underperformed in 2017. If you want to compare him to Altuve, it seems ridiculous not to point out that Altuve hit .441/.529/.661 in late and close situations. It seems ridiculous to not give Judge ANY of the culpability for the Yankees not winning as many games as the runs scored/allowed suggests they should have won. It seems ridiculous not to give Altuve any credit for the Astros outperforming their expectation.

Over the next couple of days, I’ll delve into Tom Tango’s fascinating reimagining of WAR — something he calls “The Indis.” But for now, let’s just say that I am glad Bill made his thoughts clear on what he believes is wrong with WAR. If someone would like to make their case for why WAR should not be attached to wins, I’m happy to post that here … but I have to say that until such a compelling argument is made, I think Bill is right.