# Parameterizing umpire biases using MLB Statcast data. Part 2.

In a previous post, I described how the size of the strike zone changes as a function of the count in the current at-bat. This was achieved by fitting a separate strike-zone model to each subset of pitches. In this next post I want to additionally examine the influence of previous calls more explicitly and see how these biases vary across umpires.

To start I designed a logistic-regression model encoding both the count before the current pitch and the call made on the previous n=6 pitches using one-hot vectors.

The coefficients β are estimated by minimizing *cross-entropy loss *on all called pitches. These can be converted into probabilities of a given pitch being called a strike. Similar to my earlier analysis of strike-zone areas I find that umpires are much more likely to call a strike on an *0–3* count relative to a *2–0* count (**below right**). I additionally observe that umpires have a tendency to not repeat calls, calling a strike less often when they called a proceeding pitch a strike (**below left**). These probabilities could be driven in large part driven by pitch selection (eg. by the pitcher). I will need to take into account pitch location to determine to what extent this is due to a bias in umpire judgement.

A straightforward way to account for pitch location in our model is to utilize the strike-zone I described in a previous post. For the resulting model, I *simultaneously* fit a strike-zone **and** history coefficients. The output of the strike-zone model, **p(strike|x.z**), is an input to our logistic regression model (put through a logit transform to maintain linearity).

This allows our history model to attribute changes in outcome due to both the pitcher (pitch selection) and umpire judgement. As our logistic model now includes pitch location, I can no longer directly visualize **p(strike)** as it is dependent on pitch location. Instead I will visualize the fit coefficients, **β**, along with the resulting strike zone. For this visualization, I fit a separate model to each umpire who called at least 150 games in the 2018–2022 seasons (excluding 2020, n=71).

Here, I continue to observe a similar pattern where umpires tend to avoid making the same call repeatedly, even after accounting for the location of the current pitch. This suggests a common bias originating with the umpire’s judgement. Are more umpires susceptible to this bias then others?Condensing our history coefficients subtracting “*called_strike*” from “*ball***”** we can get a scalar estimate of the bias strength.** Across all (n=71) umpires, every single one shows a bias to alternate calls** but with varying strength. The strength of this bias correlates significantly with overall strike zone variance. This history bias thus resembles a strategy to reduce call variance by less precise umpires.

Last, I looked to see if any of these terms have changed across time. I fit a separate model each year for all umpires with at least 20 games/season. Using a mixed effect model, I examined the effect of year after controlling for umpire identity. The main effects I observed:

- A significant reduction in
*σ_x*across years (β_year=-0.002, 95% confidence interval: [-.004,.000], p=0.039) but no change in*σ_y*. This suggests umpires may be getting more precise across time. - A significant increase in
*count-bias*(β_year=0.01 [0.002, 0.019], p=0.018) but no change in history-bias due to past calls. - A large increase in the average
*strike zone area*for both left and right handed hitters (Left: β_year=0.036 [0.021, 0.050], p<1E-6; Right: β_year=0.042 [0.027, 0.056], p<1E-7). This shear strength of this effect could be the most interesting to investigate further. Note that a similar trend was suggested from a study performed on much older data.

In this post, I demonstrated an exponentially decaying *history-bias* and a *count-bias* observed in umpire behavior that is independent of pitch selection. This further demonstrates that umpires use a lot more than pitch location when calling pitches and that these biases are universal across the league. The strength of these biases seem to be stronger in less precise umpires, suggesting they may emerge from a compensatory mechanism to reduce uncertainty. Lastly, we observe that several properties of umpire behavior are changing across time. This could be due to a variety of factors including the fact that umpires now get periodic feedback from StatCast data.

In a future post I plan to unpack whether these biases are perceptual (as in due to changes in how umpires physically see the pitches) or decisional/cognitive. I also plan to quantify to what extent these biases actually impact the outcome of games.