Do expected hitting stats prelude a player’s performance the following year?

Ethan Mann
6 min readDec 13, 2023

--

Examining expected batting average, slugging, and weighted on base average, to see if there is any correlation to predict their statistics the next year.

Expected hitting statistics have gained popularity among both casual and advanced fans, mainly because they don’t require complex formulas. Some of the more obtainable hitting statistics include expected batting average (xBA), expected slugging (xSLG), and expected weighted on base average (xwOBA). Each batted ball is assigned an xBA based off of the exit velocity, launch angle, and Sprint Speed (on some occasions). Similarly to xBA, xSLG and xwOBA are assigned their numbers based on the same variables. These statistics are easy to track to see if a player may be “over performing” or “underperforming”. For example, if Jose Altuve had an xBA of .245, but a batting average of .311, you could make an argument that he overperformed and may regress with a larger sample size. If Pete Alonso had a .346 wOBA, but a .368 xwOBA, you could make an argument that Alonso is due for positive regression.

To conduct this study, I found the “big three” expected statistics (xBA, xSLG, xwOBA) from all MLB players from the 2022 season and their raw statistics from the 2023 season. Only players with a certain amount of batted ball events (bbe) qualified for each season, which was about 250 players. However, that was shrunk down to 176 due to players needing a certain amount of bbe for both the 2022 and 2023 campaigns, meaning players with long-term injuries/suspensions, such as Fernando Tatis Jr., and players who were called up to the big leagues in 2023, such as Elly De La Cruz, were not eligible.

To start off, we have all the players’ 2022 xBA to compare to their 2023 BA. It showed a .52 correlation with a few outliers, for better or for worse. Cody Bellinger is the first point that hits me with an xBA of .213 in 2022 with the Los Angeles Dodgers and a BA of .307 in 2023 with the Chicago Cubs. He is an example of overperforming his expected statistics and teams may be expecting him to regress to his natural mean, possibly explaining his cold free agency market today. His exit velocity and hard hit rate decreased, however, his sprint speed increased. From looking at the comparison it shows that a player’s batting average hugs close to their xBA from 2022 if it is somewhere in the general area of .250. As it approaches .300, it may depend on the type of hitter he is. There are players who routinely over perform some of their expected statistics, such as Luis Arraez, Jose Altuve, and Tim Anderson. Arraez and Altuve registered an elite contact% of 93.8% and 82.9% in 2023, respectively, compared to the Major League average of 76.4%. Throughout Anderson’s career, he has a below average 74.8% contact%. Does this abnormal 2023 season show a warning for what’s to come for Tim Anderson? A .52 correlation shows a moderate correlation between xBA in 2022 and BA in 2023. Is this one year sample size enough? I decided to look in to xBA in 2021 and then BA in 2022 to see if there was any difference. The correlation between the two actually weakens which makes me believe that it is ultimately difficult to predict a player’s batting average from expected batting average from the previous year, even more so with the new rule change only allowing two fielders on each side of second base.

Now, let’s delve into more intriguing aspects of the analysis. Expected slugging from the previous year to a hitter’s raw slugging shows a stronger correlation at a near .60. Ultimately, this makes sense as players will normally gain power if they enter or continue their prime years as they gain strength and power. Some of the biggest gainers included Ronald Acuña Jr. who gained around .100 points from his xSLG from 2022 to his real SLG in 2023. The 2023 NL MVP suffered a devastating torn ACL mid-season in 2021 which resulted in an off year for Acuña in 2022 in his expected statistics. His maximum and average exit velocity were at their lowest points since his 2019 season. He dropped from 97th percentile to 82nd percentile in regards to sprint speed, appearing that his legs were not exactly under him for the duration of the season. This is impactful for a player like RAJ who has a minimal leg stride, suppressing his legs into his swing. Some players had somewhat drastic drop offs, such as Jose Abreu, Carlos Correa and even Mike Trout due to older age and injuries. Abreu just played his age-36 season, Correa lost some athletic ability from his ongoing foot issues including being diagnosed with plantar fasciitis. Trout is an interesting player to look at since he reached career lows in xSLG (.523) and SLG (.490) in 2023. Those are still elite numbers and at 32 he is still among the game’s best hitters, but did multiple back injuries from the 2022 season have an affect on him for the 2023 season? I believe this shows expected slugging can foreshadow a player’s slugging the following year, barring any injuries.

Similar to slugging, wOBA exhibits a stronger correlation, nearing .60, composing that you have a decent chance predicting their wOBA in the next year with their xwOBA. Giancarlo Stanton had the same wOBA as Amed Rosario of .297, and Rosario was traded for Noah Syndergaard in a “scraps for scraps” deal between the Dodgers and Guardians. Stanton’s bat speed decreased 4 miles per hour, potentially showcasing why he struggled against higher velocity on FB’s, and hitting more ground balls especially towards the end of the year. Expected weighted on base average can be “tricky” to predict since the formula changes each year. For slugging, a double is obviously twice as good as a single, but for weighted on base average sake, each hit is weighted differently while expected weighted on base average also removes defense from the equation.

Overall, it is incredibly difficult to try and predict a player’s offensive output. Having a correlation around .50 is considered ‘moderate.’ However, upon closer examination, expected statistics emerge as one of the better indicators in predicting a player’s performance in the following year. Naturally, there will be outliers due to injuries, stance changes, and other variables, but it remains one of the most reliable concepts for prediction.

Data from Baseball Savant

Thank you for reading!

You can also find this on Twitter/X: @EthanMann02

--

--

Ethan Mann

University of South Carolina | Here is a link to my Github for my coding projects! https://github.com/ethanmmann02