This is the first post of a four-part series about Pitch Quality and its applications. The subsequent articles can be found here:
Calculating the effectiveness of each and every pitch you see on TV seemed like a pipe dream for me when I first started thinking about it about three years ago. My friend and then-Pirates Intern Matt Kane got me thinking about ways to tease out the good pitches from the bad in a statistically rigorous way, and I’ve been thinking about doing so ever since. I know this has been done before, but as I researched more about the public solutions to this question, I decided I’d still like try it out myself and make the results public and transparent.
To put a value on every pitch, I had to start by deriving the run value (aka linear weight) of every pitch of 2019. For those unfamiliar, all you need to know is that events that benefit the offense (hits, errors, balls) have a positive run value, events that benefit the defense (outs, strikes) have a negative run value, and balls and strikes have small run values compared to batted balls.
Even though balls and strikes have run values near 0, their value can add up, as about 75% of all pitches in 2019 were a ball or strike that did not end the at-bat. That’s a lot! Take a look at the distribution of events from last season:
Each non-ball or strike event has a run value (again, also called its linear weight) that tells us how many runs a team can expect to score the rest of the inning after that event occurs. Ball and strike run values are computed in a more complicated manner having to do with the value of moving from the previous count to the new one. Balls good for offense. Strikes (and fouls) bad for offense.
The idea of this project was to take a pitch “X”, find the 100 most similar pitches to X, (weighted) average the actual run values of those 100 similar pitches, and call that the “expected run value” of pitch X. Thank goodness for the run values from earlier! (Why 100? I toyed with many values and 100 was optimal for model performance.)
For this task, I used the K-Nearest Neighbors algorithm. Many regression model types can be used to predict the run value of a pitch, but most are a black box to a non-statistician, so I opted for KNN in part due to its interpretability.
I fiddled with several combinations of variables from the public Baseball Savant Statcast data, but ultimately settled on using Release Speed, Release Point (x,y,z), H and V Movement, and Plate Location (x and z). Using all of these features gives me confidence that the model will be comparing pitches that are visually similar to an observer and to a hitter.
Why would we not just average the run values for each pitcher’s pitches and call it a day? We could certainly do that. That would be a type of “results-based” analysis where we break down the results of what actually happened on the field. When we do that, we can say the following pitchers had the best results last season:
This list is similar to looking at a list of 2019 ERA leaders. These are the guys who did the best at limiting bad outcomes on their pitches. But who cares? What I want to know, and what is more helpful information going forward, is who should have done well? Like FIP for ERA, we want to know what should have happened, and that’s what my model does that simply looking at outcomes of each pitch does not.
Here is how the distribution of each pitcher’s average expected run value looked:
Pitchers below the red line were very good, those above the red line were very bad, and most pitchers were somewhere near the middle, but still below 0, which is exactly what we would expect.
Who were the pitchers on the far left with the lowest expected run values on their pitches in 2019? Take a look at the leaders and their expected run value percentile among their group:
Leaders for Starters (>2500 pitches)
Leaders for Relievers/Season-Shortened Starters (500<pitches<2500)
The lists check out! We see a lot of familiar names and some new ones, which is always good. When you tell your friends about this list, make sure to tell them that these players weren’t the best pitchers of 2019, but that they were the pitchers who threw the best pitches on average. When interpreting, remember that this is a pitch quality metric, not a player quality metric.
Having the estimated expected run value of every pitch of the 2019 season can be helpful in player evaluation, crafting pitching and offensive game plans, and identifying/developing a pitchers’ strengths and weaknesses by finding where in the zone a pitcher is likely to have success based on past similar pitches.
I want to use some concrete examples to break down what makes a pitch “good” or “bad” according to my model. Click the links to watch the videos.
Best Pitch of 2019
You’re underwhelmed. You were expecting this:
Well, here’s the thing. Bauer’s pitch is great, but no matter how insane it looks, its most similar pitches are *at best* going to be swinging strikes but are likely going to be balls. They will almost never produce an out, and that’s where pitches earn their run value, because outs are worth way more than balls and strikes.
So the profile of the kind of pitch with a very low expected run value is one that is *always* negative, whether it be a strike or an out on contact. Unfortunately for the viewer, this means a well-located fastball makes sense for the top pitch of 2019.
Worst Pitch of 2019
The worst pitch of 2019 was taken for a strike. It’s run value was negative, meaning it helped the pitching team. Going from 0–0 to 0–1 is good! But it’s expected run value was very positive, meaning if he threw that pitch 99 more times…let’s just say Adam Eaton wants that one back. And that’s why we can say that Jose Alvarez started off the 9th with the worst pitch of 2019.
Best Pitches From the Best Pitchers
What to elite individual pitches look like?
Well located slider.
Tommy Pham singles on a sharp line drive to right fielder Josh Reddick. Kevin Kiermaier scores…
Tommy Pham singles on a sharp line drive to right fielder Josh Reddick. Kevin Kiermaier scores. Mike Zunino to 3rd.
Well located fastball. This one was actually hit! Still, 99 on the outside is not a bad pitch.
Manny Machado singles on a soft ground ball to second baseman Brian Dozier. | 06/09/2019
Manny Machado singles on a soft ground ball to second baseman Brian Dozier.
| 06/09/2019 Manny Machado singles on a soft ground ball to second baseman Brian Dozier.www.mlb.com
Well located changeup. Another hit?! That’s a tough pitch to hit, but this time it was no match for Manny “Johnny Hustle” Machado’s blazing speed.
Walker Buehler Swinging Strike to Kevin Pillar | 05/01/2019
Walker Buehler Swinging Strike to Kevin Pillar
| 05/01/2019 Walker Buehler Swinging Strike to Kevin Pillarwww.mlb.com
Well located slider!
Well located curveball!
Lots of pitchers have incredible “stuff,” but it doesn’t matter much if they can’t locate it. Looks like my grandpa’s conventional baseball wisdom is actually right about this one.
Best Pitch Types
In 2019, here is a sorted list of which pitch types were the most effective in a vacuum:
If the results column is intimidating, simply don’t think too hard about it. The order is what it interesting here. Breaking balls seem to have better expected outcomes for pitchers, which is why so many players are ramping up their usage of those pitches in recent years. A Knuckleball was the only pitch you could throw in 2019 and expect the outcome to help the offense more than it helped your team. Good job Steven Wright.
Individual Best Pitch Types
Tyler Rogers Curveball
Tyler Rogers Called Strike to Jedd Gyorko | 09/29/2019
Tyler Rogers Called Strike to Jedd Gyorko
| 09/29/2019 Tyler Rogers Called Strike to Jedd Gyorkowww.mlb.com
Submarine Gang Represent
Joe Smith Slider
Joe Smith Called Strike to Adam Eaton | 10/28/2019
Joe Smith Called Strike to Adam Eaton
| 10/28/2019 Joe Smith Called Strike to Adam Eatonwww.mlb.com
Pooslinger Gang Represent
Tim Hill Slider
Submarine Gang Represent…okay so maybe we’re noticing a pattern. As is always a danger with sorted lists, I think all the sidearm guys may have their stats boosted due to a less variable sample of similar pitches. This could be a future adjustment to the model, but for now please let me believe that sidearmers are truly the cream of the pitching crop. In reality, the presence of Scherzer, Chapman, Glasnow, and Castillo’s signature pitches on this list again give me confidence in the results of this model.
Here are some fun things I dug up along the way!
Worst Pitch Resulting in a HR
Walker’s 2nd home run | 07/06/2019
Christian Walker crushes his second home run of the game to deep center field, giving the D-backs an 8–0 lead in the…
Kind of surprising here that a 96mph pitch on the inner half had such a bad expected run value, but I guess Christian Walker showed us why.
Best Pitch Resulting in a HR
Again, pretty surprising that an 87mph, belt-high fastball had a good expected run value. But there’s so much going on here! First: Christian Walker hit BOTH the hardest homer and the easiest homer of 2019. What are the odds? Second: This was Kershaw’s BEST PITCH OF THE YEAR! Jeez. Third:
Best Swinging Strike
The Eduardo Rodriguez foul ball from earlier feels like cheating, so here’s the top whiff: a class Luis Castillo changeup on the black that had Anthony Rizzo stumbling.
Worst Swinging Strike
I think it’s safe to say Aaron Hicks wasn’t anticipating a letters-high changeup on 3–1.
Thanks for following along and feel free to DM me on Twitter @Moore_Stats with any questions!