Pitch Quality 2: Estimating WAR

Ethan Moore
Something Tangible
Published in
7 min readJun 28, 2020

A while back, I created a model to estimate the quality of each pitch thrown in the MLB in 2019. If you’d like, you can check out the article here. For a refresher, the model found the expected run value for a pitch by weighted averaging the outcomes of the 100 most similar pitches that were thrown throughout the season giving us an approximate answer to the question “What kinds of results would we expect each pitch to get over many samples and in a vacuum?”

In that article, I looked at which pitchers had the best expected results on a rate basis. That is, I took all of their pitches and looked at how many runs they were expected to be worth per 100 pitches.

But baseball isn’t played on a rate basis. In a sense, baseball is a game of counting stats. At the end of the day, the only stat that matters to teams is cumulative win total. This is why our most powerful single statistic, Wins Above Replacement, is most often expressed as a counting stat. The more you play at a high level, the more WAR you rack up. Simple.

So then, the same thing should be true of a pitch quality metric, right? Looking at whose pitches were the best on a rate basis is certainly insightful, but we could also see whose pitches contributed the most expected value to their team’s results on the field! Let’s sum these babies up. But first…

What is fWAR?

A quick overview of how Fangraphs.com calculates Wins Above Replacement for pitchers (called fWAR). To quantify a pitcher’s value relative to the league, the site uses Fielding Independent Pitching (FIP). This statistic, as you likely know, is an ERA estimator. FIP does not capture what actually happened on the field. FIP captures what should have happened on the field. Sound familiar? This is almost exactly what my pitch quality metric does!

(Fangraphs Glossary Page if you want to learn more)

Method

So if fWAR is calculated using an earned runs allowed estimator, then could another version of WAR be calculated using a pitch quality estimator? In fWAR, the calculation starts with FIP and puts it through a mathematical blender (scaling it, adjusting for park and league, etc.) in order to answer the question “How many runs was each pitcher worth above replacement level?” This runs total is divided by that season’s runs/win constant (~10.3 in 2019) to finally answer “How many wins was each pitcher worth above replacement level?”

My pitch quality metric already has the units of expected runs per pitch. So if we sum up all of the expected run values for each pitch thrown by a pitcher, we get an estimate for how many runs he was worth that season! I did this and divided each pitcher’s expected run total by the 2019 runs/win constant and viola: a quantification of how many wins each pitcher was worth to his team in 2019 based solely on the quality of the pitches he threw.

And just like WAR, the more he pitches at a high level, the higher the win total, which I will call Pitch Quality Wins (PQ Wins for short).

Leaders

You may remember I provided leaderboards for the rate-based pitch quality statistic I found in my original article (again, linked here). Let’s take a look at the same thing but for PQ Wins, which again are cumulative and are on a similar scale to WAR.

Leaders for Starters (>2500 pitches)

Leaders for Relievers/Shortened Season Starters (500<pitches<2000)

You’ll notice a few things. First, this starter leaderboard here is almost the exact same as it was for the rate-based leaderboard but the reliever/shortened season starter list is quite different. The more you pitch well, the more PQ Wins you acrue. Here, almost all of the guys on the second leaderboard had close to 2500 pitches which was the upper cutoff.

Also, you’ll notice that these PQ Wins values look like fWAR values! The guys good starters have 5–8 and the good relievers have 2–3. Seems about right, right? So how are PQ Wins and fWAR related?

The line is x=y, not a line of best fit

With an R² of 0.71, we can say that these two stats are pretty highly correlated. This graph is what validates my theory behind PQ Wins. Summing up the run values of every pitch and dividing by a constant may seem suspicious at first, but it does a pretty good job of approximating fWAR. Of course, both of these stats are also correlated with Innings Pitched as most pitcher counting stats are. That is some of what is driving this relationship, but not all of it (PQ Wins and fWAR have R² values of around 0.6 with IP individually).

Let’s take a look at some points of interest.

The two best pitchers by fWAR are also the two best pitchers by PQ Wins. That’s nice. But what about the guys who the two metrics disagree about?

Better fWAR than PQ Wins

These are pitchers who had very good results on the field in 2019 but did not score well in my pitch quality model, indicating that their pitches performed better than they should have in 2019. Recall that my pitch quality model takes stuff and command into account as well as other factors like release point. For this reason, we can’t just say these are the guys with elite command. But this list may reveal a weakness of the pitch quality model. These pitchers may be above average at pitch sequencing, deception, or are befitting from a skilled pitch-framing catcher, all of which would be traits not captured by the model. That’s my theory, but I cannot say for sure why these pitchers outperformed their pitches last season.

Better PQ Wins than fWAR

These are pitchers who had very good pitches last year according to my model but did not see the on-field results. These players could have been unlucky, poor sequencers, throwing to poor pitch framers, or a multitude of other explanations. Again, I’m not really sure what all of these guys have in common but I am not willing to just chalk it up to luck.

The most successful player by fWAR in this group is Clayton Kershaw. He put up 3.4 fWAR and 5.1 PQ Wins. Kershaw still has elite stuff, but for whatever reason he was hit around last year more than expected. If you have any theory on what a large difference between fWAR and PQ Wins may represent, I would love to hear it.

Least difference

For fun, here are the pitchers with the smallest difference between their 2019 fWAR and PQ Wins.

On average, PQ Wins are .25 wins higher than fWAR and the distribution of their difference is approximately normal with a few outliers (Lynn, Quintana, etc.). In general though, these metrics are typically pretty similar for the same pitcher.

Usability

I think the main use for a metric like PQ Wins is as a WAR estimator. Note that I do not view PQ Wins as an alternative to fWAR. Wins Above Replacement is a much more rigorous calculation, accounts for many more variables, and relates pitcher contribution to replacement level which PQ Wins does not. However, these two metrics are pretty highly correlated all things considered meaning that generally players with high PQ Wins will also have high WAR. This is not helpful at the MLB level. We have MLB WAR and it works. But what about at other levels?

In college or the minor leagues, calculating fWAR can be a pain due to the necessary league, ballpark, and run environment adjustments. I think it would be much easier to replicate the PQ Wins model in college baseball (for games in the Trackman database) than calculating WAR. This could be a helpful tool for college programs with access to the Trackman database in order to get a feel for which of their pitchers and opponents have the best stuff and could be helpful for teams to quantify college arms this way to supplement their existing processes.

If it could be determined that a large difference between PQ Wins and fWAR is indicative of a certain trait of a pitcher’s season, that could be an interesting and informative piece of information as well (similar to how a large difference between a pitcher’s ERA and FIP is helpful information).

Final Thoughts

Honestly I just thought this was another cool application of my pitch quality model that I had not previously thought about and wanted to share. After analyzing this, I think the main takeaway here is that WAR, a heavy duty calculation, can be pretty closely replicated by just summing up the individual expected run values of every pitch thrown by each pitcher!

Thanks for reading and feel free to reach out with feedback or comments on Twitter @Moore_Stats.

--

--