Parameterizing umpire biases using MLB Statcast data. Part 1.

Tim Sheehan
6 min readMay 2, 2024

--

Cognitive biases are an active area of research in psychology and economics. These biases can take many forms including well known examples such as framing effect where equivalent data presented in positive or negative language can dramatically alter how that information is interpreted. Typically, those phenomena are studied in highly artificial lab experiments from demographically homogeneous populations (undergrad students between 18–22 years old). This shortcoming of typical psychology experiments drove my interest in studying these biases in a much more ‘real world’ environment with loads of data. Previous work has shown umpires exhibit the “Gambler’s Fallacy” whereby they are more likely to call a ball after calling a strike (and vice-versa). In this post, I will examine a similar phenomenon in the behavior of MLB umpires and how the current count in the at-bat impacts the shape and size of the strike zone.

Dataset: Since the 2007 season, MLB has recorded and published the velocity, location, spin, and outcome of every pitch thrown. Since the exact systems used to record this information have evolved, I will focus on data since the 2016 season. I utilized the python pybaseball package to pull pitch data from the 2016–2022 seasons. The dataset was natively over three GB in memory, but I was able to reduce this down to ~100 MB by shrinking float precision and converting many variables that occupy only a few set values {team, inning, # of balls, ...} to a categorical variable type. For the purposes of this analysis, I will only be utilizing the x and z (height) positions of the pitch as it crosses the plate, the result of the pitch (ball, strike, hit, etc.), and the count of the current at bat.

Model: The rulebook strike zone is defined by the width of the plate and height and stance of the batter. The called strike zone can differ from this defined strike zone substantially. Umpire response (called ball or strike) changes smoothly and monotonically as a function of pitch location resembling psychometric curves studied in neuroscience and psychology. This curve can be well described by a point of subjective equality (PSE, P(strike)=0.5) and the width or uncertainty around the pitch location. These two parameters can be estimated by maximizing the likelihood of a Gaussian CDF with center point μ and uncertainty σ. The fit curve below (red) does a substantially better job of explaining calls than the “rulebook” strike zone (black).

Black dots: binned P(strike|called pitch, 1.8'<plate_z<3.1' ). Red line: Gaussian CDF fit as part of full called strike zone model. Black line: rulebook strike zone.

We can extend this simple model explaining just one wall of the strike zone to a more complete description of the entire strike zone. Formally, we define

for the 4 walls, w, of the strike zone {Left, Right, Bottom, Top}. The equations for the four walls are:

We can estimate the 6 free parameters {μ1, μ2, μ3, μ4, σx, σz} for a given set of called pitches. Our objective function is to maximize the log-likelihood of umpire calls and best fitting parameters are estimated using gradient descent. Note that we use a softplus transformation for σ values to ensure they remain non-negative. The visualization of the called strike zone for left and right handed batters is shown below (for the 2017 season). Note that for this visualization, even the “rulebook” strike zone is wider than home plate as we are using the center location of the pitch and only a part of the baseball needs to pass over home plate to be considered a strike.

Called strike zone for left and right handed hitters in the 2017 season are shown in red/blue respectively. Also visualized is the width of home plate, size of baseball, and rulebook strike zone. Right handed/standing batter depicted. Note that strike zone is presented from catchers POV. Player graphic courtesy of vecteezy.com.

Comparing strike zones between batter handedness, it is clear pitchers are less likely to get inside and more likely to get outside calls. We can now visualize how the called strike zone changes based on various game conditions, for instance the count in the current at bat.

Called strike zone depicted for both 0–2 and 3–0 counts for right handed batters only.

Alternatively, we can also collapse these strike zones to their surface area and see how this changes as a function of the number of balls/stikes in the current at bat:

Area of strike zone (height x width; ft^2) for all possible counts.

These changes are not subtle! The called strike zone for right handed batters is 0.97 square feet or 33% larger on a 3–0 vs. an 0–2 count. There are many ways to contextualize this bias. On the one hand, we could emphasize how hopelessly subjective and fallible the umpires are, “shouldn’t the MLB just switch to the automated strike calling system used in the minor leagues already”? Alternatively we could first acknowledge that umpires are remarkably precise, with a typical σ around 60% the width of the baseball. The biases they do exhibit are in the exact direction most outside observers would prefer, by shifting their PSE slightly, they greatly reduce the chance of large impact mistakes (eg. calling a true strike a ball on a 3–0 count, leading to a game changing walk).

From a more statistical perspective, these biases may also be Bayes optimal. Umpires are making decisions under substantial uncertainty, pitches are fast, requiring resource limited judgements, and move in trajectories specifically meant to mislead batters and umpires alike. Umpire’s are also surrounded by agents attempting to mislead them by framing pitches (catchers) or selling that a pitch was un-hittable (batters). In the face of all this uncertainty, umpires can improve their estimate P(strike | x, y) by using Bayesian inference to include the probability of a given pitch being a strike based on the current count.

We can get this prior probability by examining the average P(strike) of thrown pitches for a given count using a fixed model fit to all pitches. Strikingly, we observe a nearly identical pattern in where pitches are thrown based on the count relative to umpire strike zone bias. Put another way, in more hitter friendly counts (eg. 3–0), umpires are more generous to pitchers in their calls AND pitchers throw pitches more likely to be called a strike, independent of this umpire bias. Thus the bias observed by umpires may actually be optimal in that it minimizes their overall error in calls. On the managers side, one could use these data points to argue for giving more hitters the green light in 3–0 counts as pitchers and umpires alike are biased against giving them the fourth ball they desire.

Average model output P(strike) for model trained across all pitch counts evaluated separately for pitches thrown in each count. Note that all pitches are evaluated, not just “called” pitches used for training.

In this brief exploration, I used features of pitch location to model how the called strike zone differs from the rulebook strike zone and how the current count can influence this. I plan to further explore how other features -including pitch movement, umpire identity, and more- influence umpire behavior.

--

--