An Introduction to Expected Weighted On-Base Average (xwOBA)

13 min readSep 4, 2019

The Baseball Data Machine Learning team is focused on using data to tell the story of baseball and help serve baseball fans and clubs. Our team is constantly learning from the broader data science, machine learning, and sabermetric communities and will share our own experiences through this blog.

What is Expected Weighted On-Base Average (xwOBA)?

A player’s stat lines often don’t align with our perception of their skills, a threshold known as the “eye test.” A hitter can go 0-for-4 with four line-outs or warning track drives. On the other hand, a hitter could have a brilliant game with a couple of bloop hits into the outfield or misplayed balls. This is where xwOBA (pronounced “ex-woh-ba”), along with Expected Batting Average (xBA), help us close the gap between what we see on the field and what is recorded in the box score.

xwOBA is the most notable of our three “expected” Statcast metrics as it corresponds to the all-encompassing hitting metric, Weighted On-Base Average (wOBA). In short, wOBA measures offensive value of players by weighting outcomes (HR, BB, 1B etc.) by their run value. Compared to BA and SLG, wOBA more accurately represents a hitter’s contributions to run scoring and thus his overall offensive value. The main differentiation between xwOBA and wOBA is that we model batted balls separately from walks, strikeouts, and hit by pitches. We ignore actual batted ball outcomes, which are subject to factors beyond the hitter’s control, and focus on physically tangible tracked “skills” to reach a conclusion about what would have happened to balls in play under completely average MLB game conditions. It provides us and the public a level playing field on which to compare players’ contact profiles, and a useful way to compare players’ past actual and expected performances.

Calculation of xwOBA

The formula for xwOBA is:

xwOBA = (xwOBAcon + wBB x (BB-IBB) + wHBP x HBP)/(AB + BB - IBB + SF + HBP)

where xwOBAcon is the estimate for xwOBA on contact produced by the Statcast-based model and w[Stat] is the wOBA weight for non-contact outcomes.

How is it created?

We base the currently public version of xwOBAcon on three variables: exit velocity (EV), launch angle (LA), and sprint speed. xwOBAcon varies drastically and non-linearly by the first two variables as you can see in the graphs below, however, there is a distinct pattern.

Our first release of xwOBA utilized only EV and LA. We built the first version of xwOBA with k-nearest neighbors (k-NN) regression to handle the obvious non-linearity of wOBA as seen in the plot above. k-NN takes average wOBA values of the closest k points measured by Euclidean distance as the estimate for the “unknown” point. In the context of our xwOBAcon model, the model is trying to choose how many similar hit balls to average together to produce an xwOBAcon estimate as close to the actual wOBAcon as possible. We end up averaging the wOBA values of approximately 400 of the most similar (closest) balls in play to predict wOBAcon.

As many fans have noticed or intuitively guessed, faster players will often outperform our xwOBA estimates since we weren’t taking into account the speed of the batter, another important skill. We found that Generalized Additive Models (GAM) helped us capture the linear effect of speed with the highly non-linear interaction of LA and EV.

Our newest model is a combination of k-NN and GAMs. We use GAMs to model most weakly hit balls, shallow infield pop-ups, and grounders, the idea being that these are balls where speed matters. We apply our existing k-NN model to the remaining liners and fly balls (where the batter’s speed has far less impact on the outcome), only using EV and LA to estimate wOBA. You can read this post about xBA to learn more about the specifics of the new model.

xwOBA on contact (xwOBAcon) error by season

The errors from our xwOBAcon model are close to normal and centered close to 0. To give you a sense of how well our model performs, a naive prediction of wOBA using the league average wOBA (essentially blindly guessing) for the year results in a root mean square error (RMSE) between 0.51 and 0.53. We can come up with a better naive prediction by using average wOBA values by batted ball type. Using the average wOBA values for the batted ball type classifications below, we can get the error down to 0.45 RMSE, most of the way to the final model performance.

The graph below shows that the model helps us improve the prediction of barrels more than any other batted ball type. We are able to improve the poorly-topped and poorly-weak after adding sprint speed into the model.

However, even with these improvements, barrels and solid contact are the areas in which we still have the most uncertainty. And this makes sense. Many more factors affect the result of a well-hit ball than a ground ball to the shortstop. What if it is windy? What if it is hit to right field at Yankee Stadium? A hard-hit ball going a foot or two farther could turn a 0.000 wOBA (out) into a 2.000 wOBA (HR)!

You may also be wondering, why didn’t we include these exogenous variables? We very carefully chose variables to include based on their relation to player skill. Since we want our model to be context neutral, we want to only include aspects of the batter and pitcher that we know when the ball is hit. We first identified exit velocity and launch angle as outcomes batters can control and pitchers can induce or strategize against. Aspects that we don’t control for include the fielding team, the park effects, and the weather. These are factors that a batter or pitcher have limited control over, and thus, sometimes we don’t want to account for these elements if we want to compare players’ xwOBA under average conditions.

Certainly, the exogenous variables are important to understand why players are under or over-performing their xwOBA. The exact attribution of error to exogenous variables is a work in progress, but below we have explored some of the error to which these factors contribute. First, let’s take a look at how our new implementation improves our error associated with speed. The old k-NN only model didn’t vary by speed, but we see a clear increase in wOBAcon for players with higher speeds. Our new model that incorporates speed generally follows the linear trend of wOBAcon by speed for these more weakly hit balls.

Let’s take a look at spray angle. We all know that it benefits players to pull hard-hit balls down the line just because of the simple fact that fences are usually closer to home plate down the line. We see that pattern emerge on barrels and solid-contact where actual wOBA is higher than xwOBA for balls hit to RF/CF. The patterns for other hit types are a function of fielder alignment or distance to 1B in the case of poorly-weak hit balls. We currently do not control for spray angle since we haven’t found strong enough evidence that certain pull-oppo tendencies lead to better wOBA results. For a short illustration check out Tom’s blog.

Higher temperatures lead to higher wOBA on balls in the air since balls tend to travel farther at higher temps. Great. Physics works!

And temperature drives another trend: error increases in the most extreme months April and July.

We also have data for wind speed and direction. Since wind is only recorded at the beginning of the game and is more variable than temperature throughout a game, we don’t see as strong of a pattern. However, we can see somewhat of a trend: faster winds blowing in from the OF decrease wOBAcon and faster winds blowing out to the OF increase wOBAcon.

Moving on to fielding shifts we see some slight differences in RMSE coming where we might expect. xwOBAcon has higher errors on infield shifts with poorly-weak contact and higher errors on pop-ups with strategic outfield alignments.

Finally, we took a look at errors by stadium. Splitting up the batted balls into barrels and other contact, we can see there are some stadiums we have more trouble predicting HR. These tend to be stadiums with short porches or unique aspects that make it easier or harder to hit HRs.

Below are some of the best and worst performing stadiums. Given the outfield fence alignment, it is easy to see why these plots make sense.

The Reliability and Predictive Performance of Player Level xwOBA & xwOBAcon

While model performance is important, many metrics are created to assess player performance. We must understand the advantages and shortcomings of xwOBA at the player level to recognize when and how to use xwOBA instead of other metrics.

Reliability

We first focused on determining the degree to which players maintain similar xwOBA and xwOBAcon from year to year. Statistics that demonstrate high correlations from year to year are useful in evaluating player skills since players generally tend to retain their core skills from year to year. To perform this test (and all subsequent tests) we utilized a weighted Spearman correlation with statistic X in year y on statistic X in year y+1. We used the harmonic mean of the wOBA denominator (if using wOBA) or balls in play (if using wOBAcon) for the weights.

We can deduce a couple things from this table immediately. xwOBA and wOBA statistics are geared towards describing batter skill. The real value in xwOBA is xwOBAcon since BB and SO are not altered by our model. We see that xwOBAcon is the most stable statistic by far for batters which means that batters generally don’t lose their ability to hit the ball hard (or soft) and in the air (or on the ground). Since BB and SO rates are relatively stable year to year, xwOBA is not far behind.

We usually assess pitcher skill with sabermetric statistics such as FIP, xFIP, kwERA, or SIERA, and this table illustrates why. These statistics, especially K%-BB% and kwERA, focus on pitcher skills that don’t tend to change drastically year to year and thus have much higher correlations than ERA/RA9 which are influenced by fielder ability and positioning. Since BB and SO rates are so stable for pitchers, xwOBA performs similarly to FIP, and without even touching balls in play, K%-BB% has much more reliability. Interestingly, xwOBAcon is just as reliable as RA9 adding a significant amount of value above wOBAcon itself by stripping out some of the context that may change from year to year.

We also tested another novel stat using barrel percentage (see batted ball type diagram above). We suspected that barrel percentage represents the subset of the Statcast data that would prove to be the most reliable and possibly predictive of all the batted ball types. We created two metrics with barrels, both pretty self-explanatory.

It’s amazing how much information we can maintain by throwing out all the other batted ball types. For batters, Barrel% is almost just as reliable a skill as xwOBAcon. For pitchers, Barrel % + BB% - K% basically on par with (and maybe a bit better than) xFIP, which makes sense since it’s combining BB and SO with a more or less an expected HR component.

Conclusions:

xwOBA and especially xwOBAcon are reliable skills for batters.
xwOBA is just as useful for describing pitcher skill as FIP because BB and SO are baked into xwOBA.
Pitcher xwOBAcon doesn’t have much staying power year to year, but then again, it’s almost as reliable as RA9.
Barrel% is the best component of EV and LA to illustrate a pitcher’s skill.

Predictive Value

We created xwOBA with the goal of predicting outcomes of balls in play based on physical skills and outcomes tracked by our systems. While not designed to be predictive on an aggregate level, we suspect that xwOBA will be useful to predict future performance since we have stripped out some of the noisy aspects of balls in play.

Let’s first take a look at batters. You can read the first row of the table as “wOBA in the first time period has a correlation to wOBA in the second time period of x”. We are separately comparing overall performance in year Y to year Y+1, performance in road games in year Y to year Y+1, and performance in the same season’s 1st to 2nd halves.

At first glance, we can see that xwOBA is the most predictive of out-of-sample wOBA out of our three stats tested above. However, looking first at full season year to year, xwOBA’s predictive power of wOBA isn’t that much different than wOBA itself. Part of this phenomenon is that most players don’t change teams year to year, so there is some bias in their wOBAcon since half their games are played at a single ballpark. If we look at only away games, xwOBA performs a little better relative to wOBA, but may still remain in the approximately ~0.033 margin of error we estimate for both correlations. Amazingly, 1st to 2nd half predictability for xwOBA is almost as good as season to season correlations and has half the sample size! When just looking at contact we might want to just look at Barrel% to predict forward.

If xwOBA was designed to be predictive at the player level, there should be different weights assigned to SO and BB to regress these factors going forward. It is useful to look at xwOBAcon, since it is an aggregation of a player’s balls in play each regressed to the average outcome. However, this still doesn’t take into account the regression in skills a player may have in the future.

We included a lot of metrics in the following tests to give us an idea of how the wide range of pitcher stats perform over different environments and timeframes. We are interested in three outcomes for pitchers in the second time period: RA9, ERA, and wOBA. These stats together will give us a solid sense of how well the pitcher performed. First, it’s no surprise that xwOBA is more useful than RA9, ERA, wOBA in predicting themselves in subsequent time periods. In a year to year full season context, xwOBA performs about as well as FIP predicting next season outcomes. Again, when we limit our test to only away games, xwOBA improves relative to FIP since it isn’t biased by home park factors.

Predictability of Pitcher Stats (Full Season)

Predictability of Pitcher Stats (Full Season Away Games)

Predictability of Pitcher Stats (1st to 2nd Half)

Our makeshift barrel metric stands out in the tables above. For pitchers, barrels prove to be a significantly stronger signal than other batted ball types. In each scenario, Barrel% + BB% — K% remains within the margin of error of SIERA’s performance on every metric, sometimes even beating it out (Note: FanGraphs doesn’t provide home/road splits for SIERA).

Conclusions and Using xwOBA

So, given all this information about xwOBA’s performance and the inherent error in the model, what should we be thinking about when using xwOBA?

Using xwOBA with Batters

xwOBA predicts future wOBA well with small sample sizes. If a player has had little service time or has made a major swing change this year, maybe it’s best to trust xwOBA rather than wOBA. Otherwise, xwOBA is not much better.
A batter won’t always improve or decline to match their xwOBA (e.g. xwOBA - wOBA is positive). A couple factors to consider with the current model when comparing a player’s xwOBA and wOBA:

Does this player play in an extreme hitter’s or pitcher’s park?
Does this player’s spray profile (pull/oppo) fit with his home park (e.g., A lefty pull hitter at Yankee Stadium)
What part of the season are you assessing? Is there been a temperature bias?

3. Barrel% is easy to use and predictive of future success on balls in play.

Using xwOBA with Pitchers

xwOBA predicts future pitching outcomes as well as FIP (which is also not designed to be predictive). xwOBA is more useful when projecting for neutral environments (unknown team after free agency) and performs even better than xFIP.
A pitcher won’t always improve or decline to match their xwOBA. The factors above also apply to pitchers when comparing xwOBA and wOBA.
Barrel% is easy to use and predictive of future success on balls in play. For pitchers, Barrel% + BB% - K% works better since it incorporates the two main skills with an important signal in barrels.

Data Sources: All non-xwOBA or wOBA pitcher metrics were pulled from FanGraphs.