Arrows in the Quiver: Classifying Pitches by Role

Matan K
7 min readApr 4, 2024
Image Generated by DALL-E

The desire to name and classify even the most niche minutiae into groups is an innate human tendency. In no arena is this more evident than in chess, where “a Dragon” or “a Scotch” have very different meanings from a mythical beast or a drink after work. The specific names may sound silly, but groupings themselves are quite useful. After all, who wants to say “I played an e4 c5 Nf3 d6 d4 cxd4 Nxd4 Nf6 Nc3 g6 opening last weekend?” Classifications are a useful shorthand that convey an abundance of meaning, albeit only to those who know the lingo. While not quite yet on the level of chess vernacular, pitching jargon has increased tremendously in the past few years. Gone is the simple “breaking ball,” replaced by the sweeper, death ball, gyro slider and more. While these more granular classifications may frustrate traditionalists, their utility is clear. A sweeper and down-breaking gyro slider move very differently, so they shouldn’t be termed identically.

Classifying pitches by their movement and velocity is useful, but that isn’t the sole way to group offerings. Pitches with similar physical characteristics may differ significantly from pitcher to pitcher in usage and performance. For instance, some pitchers, such as Jakob Junis, utilize their slider as a primary pitch that is often thrown in the strike zone. Others may exclusively use their sliders in 2-strike situations. There are numerous ways to potentially group pitches. In this case, I chose to classify offerings with k-means clustering, based on 10 usage and performance characteristics:

  1. Usage% Ahead in the Count
  2. Usage% Behind in the Count
  3. Usage% in an Even Count
  4. Platoon Called Strike + Whiff% Split
  5. Zone%
  6. Chase%
  7. Called Strike%
  8. Whiff%
  9. Average Launch Angle Allowed
  10. Hard Hit% Allowed

These metrics were selected to run the gamut of pitch attributes, including when a given pitch is used, where it is located, its ability to garner called and swinging strikes and its performance on contact. After some testing, seven classifications (clusters) seemed to be appropriate. Here are the typical characteristics of pitches in each cluster for all 10 metrics, along with a nickname for each grouping…

Pitch Data from MLB API using baseballr

On a broad level, there is a clear delineation between the first four and final three clusters. The first four clusters consist of pitches that are typically thrown in the strike zone. Relatedly, they tend to not generate many whiffs, are hit relatively hard and are used most often when behind in the count. On the other hand, the pitches in the last 3 clusters are secondary pitches that are most often used in advantageous counts to garner whiffs and chases.

On a more granular level, finer distinctions can be made between each cluster. The Low Usage In-Zone Pitch and Wormkiller are similar, as both are typically groundball pitches that don’t generate many whiffs. However, true to its moniker, the Wormkiller has a lower average launch angle allowed and is generally a much higher usage offering. The Run of the Mill and Plus Primaries are also alike, with the Plus Primary serving as a generally more effective and higher usage offering. Amongst secondary pitches, the Elite Secondary cluster is the gold standard, with a high whiff rate and plenty of usage. The 2-Strike Whiffer is generally used as a chase pitch in advantageous counts, while the Versatile Secondary is used in all count situations as more than just a whiff-generator.

Many of these cluster descriptions may seem to fit a particular pitch type and indeed certain clusters tend to align with particular pitches…

Tile Shading is by Column

For instance, as can be seen above, the Wormkiller consists mostly of Sinkers, which of course is the primary pitch that most commonly generates grounders. However, the cutters of Emmanuel Clase and Graham Ashcraft are also classified as Wormkillers, due to their high usage rate and groundball tendencies. On the other hand the cutters of David Robertson and Kenley Jansen are classified in the Plus Primary cluster, due to their flyball inducing nature.

While the k-means clustering method is useful, some pitches are inherently difficult to group neatly into a category. The Elite Secondary cluster consists mainly of sliders, sweepers (ST) and other breaking pitches. However, there is also 1 sinker present in the cluster. That pitch is Andrew Chafin’s sinker, which does not have the characteristic high whiff rate of a pitch in the Elite Secondary cluster. It instead aligns closely in other attributes such as Chase%, Zone% and Average Launch Angle. Clustering is a useful tool, but not a perfect one.

Classifying pitches by their usage and performance has several potential applications. One basic use is to examine how a pitch may perform in a particular count situation based on its classification. Here is a summary table of how each cluster performs in various count situations as measured by run value per 100 pitches (where negative/blue numbers favor the pitcher)…

Negative RV = Good For Pitcher

Some pitch clusters, such as the Elite Secondary or Plus Primary, perform well in all count situations. In sharp contrast, the Low Usage In-Zone pitch is extremely effective as a weapon when behind in the count, but is below par at putting away hitters. A similar observation, though to a slightly lesser degree, can be made about the Versatile Secondary classification. Perhaps surprisingly for a groundball weapon, the Wormkiller is generally most effective in advantageous counts for the pitcher, more so than other primary in-zone offerings.

A further potential use of pitch classifications is analyzing (and for a pitcher themselves, constructing) an arsenal. While it’s commonplace and simpler to discuss a pitcher’s offerings in isolation, a well-constructed arsenal contains pitches that excel in specific circumstances (such as count situation, as above, or platoon factors) and also tunnel well with each other. Here are the most and least common 2 pitch arsenals (1st and 2nd highest usage pitches) in 2023…

Clearly, certain arsenal combinations are far more common than others. The most popular 2 pitch arsenal is the Run of the Mill Primary and Versatile Secondary, which makes up nearly 12% of all arsenals. One particularly unconventional example of this arsenal is Aaron Civale’s repertoire. Civale uses his cutter as his primary offering while his high spin curveball acts as a versatile secondary offering. The second most common two-pitch arsenal, the Elite Secondary and Run of the Mill Primary, is favored by relief pitchers with excellent breaking balls, including Matt Brash and Lucas Sims. On the other end of the spectrum, Tanner Scott holds the distinction of being the only pitcher who pairs an Elite Secondary offering as his most common pitch while his second most common offering is classified as a Plus Primary.

This can be further broken down into the preferred secondary offerings for a particular primary pitch classification. The Wormkiller is typically a high-usage pitch that generates oodles of groundballs but not many whiffs. As such, it’s only natural for it to be paired with the 2 Strike Whiffer…

On the other hand, the Run of the Mill Primary is not used as often as the Wormkiller, especially in early counts. Therefore, it’s not a surprise that it is often paired with a pitch classified as a Versatile Secondary offering…

These examples are some of the myriad ways in which classifying pitches by role can be useful. Of course the seven clusters described here are hardly sacrosanct and classifications can vary dramatically based on the clustering method and input statistics used. Recently, stuff models (such as Stuff+, Pitching Bot and others) have advanced tremendously the collective understanding of what attributes make a particular pitch effective. Understanding how to build a complementary arsenal may be a further frontier in the quest to demystify pitching.

Thank you for reading this piece. The pitch data used was from the 2023 regular season. Much of the code used in k-means clustering and formatting the tables in this piece can be found here. I can be found on Twitter for questions and comments here.

--

--