Baseball’s Next Defining Innovation is Hiding in Plain Sight

Why Tracking the Path of Every Swing is a Must for MLB Teams

Ethan Moore
There and Back
16 min readJun 24, 2022

--

There have been over 2 million swings in Major League Baseball since the Statcast era began in 2015. Each one holds valuable clues that could spark the next evolution in baseball analytics. Most of that information is going completely unnoticed by tracking tech and analysts alike.

The role of analytics in MLB has been growing for several decades, but groundbreaking innovation seems to have slowed to a crawl. Major leaps forward have historically coincided with new data becoming available to teams for analysis. When asked about the future of baseball analytics, many experts often cite exciting developments in biomechanics. Others might echo the common refrain that “it is no longer about what data teams have, it is more about how well they can implement it on-field.” I am here to offer a different view of the next frontier for this industry. One that I think could ultimately change the way the game is understood, analyzed, and played.

In every Major League Baseball game since 2015, Statcast has recorded the position of the baseball dozens of times per second, producing terabytes of data for teams to analyze. It all gets condensed down into metrics like pitch velocity, spin rate, release point, plate location, exit velocity, launch angle, and many more. These data points have allowed us to dramatically improve our collective understanding of what is happening on the baseball field, especially in the realm of pitching. The industry’s progress as a direct result of this data being collected is undeniable.

During this time, Statcast (using Hawkeye as of 2020) has collected similarly granular information on the position of the players on the field at all times, opening new doors to even better understanding of the game in areas like defense and baserunning.

With all of this ball and player tracking data being collected, it may seem like we are measuring everything we possibly can, maximizing our analysis efforts and leaving little room for improvement. Though it may be hard to believe, we are actually missing an opportunity to track equally vast amounts of important data that could yield revolutionary insights. Nobody is talking about it.

Hitting a baseball is an incredibly complex maneuver. With professional pitchers throwing harder than ever and with increasingly nasty secondary offerings, I often wonder how anyone ever even makes contact. Of course, players mash Major League pitching every day, but exactly how do they do it?

Let’s think about hitting in the abstract. The hitter stands in the batter’s box and awaits a ball to be thrown by the pitcher. As the pitch is in the air, the brain of the hitter must predict the path of the pitch in real time. They must figure out where the ball will be, and when it will be there, in time to get their bat to that spot at that time.

This subconscious prediction informs the decision to swing or not to swing. In the baseball analytics world, we call this the “swing decision” and it is important that hitters decide to lay off pitches that are unlikely to be hit well and swing at pitches they are likely to hit well. This is an aspect of hitting that is more or less solved — we have already quantified each MLB hitter’s ability to make good swing decisions with existing data.

But if a hitter chooses to swing, we want to evaluate his ability to make desirable contact. This means consistently being able to A) not miss the ball entirely and B) hit the ball in such a way that it is likely to result in a positive offensive outcome like a double or home run. We understand how to evaluate a hitter’s results on swings using stats like contact rate and xwOBA for abilities A and B respectively. However, these stats do not quite allow us to evaluate these abilities from a process standpoint.

That said, here is my theory:

To avoid whiffs and make desirable contact, a hitter must have the ability to accurately predict the path of a ball with his bat when he swings.

When the hitter decides to swing, his brain’s subconscious prediction of the pitch’s path informs where the hitter will put his bat over the course of his swing. Some swings put the bat through the top part of the zone, some through the bottom of the zone, etc. It is this split-second prediction that determines where the bat goes and why, even if the hitter swings and misses sometimes, his bat is generally in the right area on all swings.

Major League hitters are so good, according to my theory, that they are able to swing exactly through the points in space where their brain expects the pitch to be. (Think about Major Leaguers hitting off a tee. They can hit it perfectly every time as a result of building this skill over decades of practice.)

If this theory is correct so far, we are left with a fascinating idea: The path of a hitter’s bat on a swing is an accurate representation of his mind’s subconscious prediction of the ball’s path on that pitch. Said another way, bats always go where the hitter expects the ball to be. By measuring the position of the bat in physical space throughout the swing, we could take a look inside the mind of a hitter and see exactly where (and when) he was expecting the pitch to be hittable.

Though we cannot yet read the subconscious minds of baseball players or anyone else, tracking moving objects in physical space has been successful in baseball over the past decade or so. In a world where we are already tracking balls and players, why is nobody talking about the lack of bat tracking data and its vast potential?

If you had access to bat tracking data for every swing over the course of a baseball season, you would have a quantification of each hitter’s mental prediction of that pitch’s path. Now what would you do with that information? I’m glad you asked.

Whether someone is making a subconscious prediction in front of thousands of screaming fans or coding up predictions on a computer in a windowless office (like myself lol), it is always a good idea to try to figure out “How good are these predictions?” For machine learning predictions, we determine how close to the actual “correct” outcome our predictions were in hindsight. What could it look like to quantify the quality of hitter pitch path predictions?

In hitting, a “correct” prediction of the pitch’s path (according to me) would be a sweet spot contact, high exit velocity, ~30 degree launch angle home run. That is the best case scenario. Ideal contact would look something like this: (click here to see the whole gif!)

with paired bat and ball tracking data, we could recreate this kind of visual for all swings!

Then, there are degrees of quality for a hitter’s prediction with some being closer to perfect than others.

What I would want to know is: for every swing, how close did the hitter get to the best case scenario? This could mean answering

“Over the course of the swing, how close did the ball ever come to the barrel of the bat?”

Or, you could answer

“When the barrel of the bat was at its fastest, how close was the ball?”

In either case, we could quantify the relative positions of the barrel and the ball in space. That information would help us quantify the quality of a hitter’s prediction directly rather than relying on an indirect result (using xwOBA or something similar) to approximate the quality of the prediction indirectly (which is what we currently do).

It has been said that “hitting is timing” and that “pitching is upsetting timing.”

By pairing bat tracking data with existing ball tracking data we could, for the first time ever (to my knowledge), quantify batter timing. It is unbelievable to me that in modern baseball analytics we cannot answer the simple question “Was the hitter early or late?” using numbers and precise measurements. That question is fundamentally important to our understanding of what happened on the field and how to make an adjustment next time.

When a hitter swings and misses, we understand that something went wrong. The swing attempt was a failure. But what, specifically, caused that misfire? I believe that not all whiffs have the same cause. Sometimes a hitter is on time but has his bat in the wrong place. That might look something like this:

A “bat placement” whiff

But sometimes, a whiff can happen when the bat is at the right place at the wrong time. That could look like this:

A “timing” whiff. Sorry Eugenio, nothing personal.

Sometimes it is a timing and a bat placement issue. That is when guys can end up on a blooper reel and have fans politely wondering “WHY WOULD YOU SWING AT THAT?!”

Understanding why a hitter whiffed could give players, coaches, and analysts valuable feedback to prevent more whiffs in the future. Imagine a new section on FanGraphs player pages with columns for Early Whiff %, On-Time Whiff %, and Late Whiff %. Or how about columns for Swung-Under Whiff%, Swung-Over Whiff%, or even Average Whiff Distance in inches for each hitter in the X, Y, and Z directions? Now imagine a sortable leaderboard for any of these stats! (I know some readers are salivating right now.)

The examples above are obviously oversimplified. Most of the time, a whiff would likely be a complex combination of these descriptors which we could never precisely describe with the naked eye, hence the need for robust tracking technology.

I would even suggest that our current quantifications of a hitter’s profile, especially with regards to his launch angle distribution (Groundball Rate, Fly Ball Rate, etc.) is just an indirect measurement of his tendencies to make contact too high or too low on the baseball. Bat tracking would provide a direct measurement to tell us why a ball was hit on the ground or in the air, allowing for adjustments to be made more easily to entire batted ball profiles.

Every time a hitter swings, we could be harvesting valuable clues about why the swing got the result that it did, especially if it was a miss. That information is going uncollected and unnoticed by nearly everyone in charge of quantifying a hitter’s offensive profile (my fellow baseball analysts). As a result, our collective data-driven understanding of hitting is disturbingly incomplete.*

Consider the 5 traditional tools for position players: Hit, Power, Run, Glove, Arm. The task of quantifying two of these, Run and Arm, is essentially solved by Hawkeye measuring sprint speed and throwing velocity directly. Two other tools, Glove and Power, are quantified fairly well with metrics like Outs Above Average/Defensive Runs Saved and Exit Velocity/Launch Angle, respectively. Evaluations of the Power tool would only be helped by having a direct measurement of the speed of a hitter’s bat in-game, something not currently tracked league-wide but would be available if a bat tracking system was introduced.

The Hit tool, by contrast, is generally considered to be the most difficult tool to evaluate. In my experience, different organizations can even define this tool in completely different ways. How can we consistently measure something when we can’t even agree on a definition? Of course, this is also the tool that we are farthest from being able to evaluate with a direct physical measurement like sprint speed, exit velocity, or arm strength.

There is currently no process-based metric to approximate the Hit tool. Having bat tracking data (paired with existing ball tracking data) would allow the industry to get on the same page with common definitions and quantifications of one of the biggest outstanding challenges of baseball player evaluation today.

The overwhelming attitude among the analytics community seems to be that we basically understand most of what happens on a baseball field and that there are very few real opportunities for impactful new analysis left. I believe future generations of baseball analysts will look back on us and laugh at our naivete.

There is little doubt in my mind that in 10 years, we will be tracking the barrel of the bat on every swing with the consistency and precision with which we currently track every pitch, batted ball, and player. It will be preposterous that there was ever a time we had ball tracking and player tracking data without bat tracking data.

The world if we had MLB bat tracking data

In the utopian future where this data exists, there are plenty of other applications outside of player evaluation. I won’t go into as much detail as I want to, but information on the relative position of the bat and ball at every moment opens the door for better understanding of

  • Pitch sequencing. We know that throwing a slider down and away is usually effective and we speculate that it is a good idea to throw that pitch after a high fastball. Wouldn’t you like to know for sure? Quantifying how previous pitches affect the quality of subsequent hitter predictions, and perhaps solving one of baseball analytics’ biggest mysteries, is a fascinating possibility to me.
  • Skill development. With a direct, measurable metric for a hitter’s quality of predictions, we can test different methods of improving this skill and open the door for a whole new frontier of data-driven offensive player development.
  • Understanding matchups. Information of MLB hitters’ in-game swing paths is not widespread at this point, but the Giants’ alleged strategy of matching hitter swing planes with the planes of incoming pitches could be investigated and potentially implemented more easily with this data.
  • Opportunities for improvement upon existing models. My time in MLB front offices taught me that I am not a “model error minimizing” type of data scientist and likely never will be. However, many baseball analysts care very much about making their models as predictive as possible. If even a few of the new metrics to come from bat tracking data are useful, adding them to existing models could greatly improve their predictive power and reduce prediction error greatly.

If you have read this far, you may have ideas of your own about this topic. Whether you agree with my opinions or not, we can agree that tracking this information would open up a whole world of scientific exploration to prove or disprove anyone’s hitting hypotheses.

There is so much more juice left for baseball analysts to squeeze. There is so much we still don’t know and our job is not done. We just need the data (and maybe some imagination).

Though I have been developing this idea since early 2018, I first heard that bat tracking may become a reality in a 2019 article in The Athletic announcing Hawkeye as the league’s new on-field tracking system.

The article includes a letter to MLB teams from Chris Marinak, MLB’s executive vice president, strategy, technology and innovation where he says of Hawkeye:

“We expect this next generation system to significantly improve the accuracy and precision of ball and player tracking and unlock new tracking opportunities like bat swing path tracking and player limb tracking.”

He later mentioned that the “swing path tracking” feature would be “released over time” indicating it would not be included in the 2020 launch of the Hawkeye system league-wide.

That was essentially the last I heard publicly about bat tracking becoming a reality in MLB.

Before the 2021 season, I assumed this feature had been rolled out and was available for MLB front offices to begin digging into. The opportunity to work with this data and race with the other 29 teams to develop a competitive advantage was honestly a big reason why I was excited to be joining a front office as an employee for the first time in my career. That was my ultimate ambition, and it was one I could not achieve anywhere else but with a MLB R&D department.

Unfortunately, as of mid-2022, it appears that Hawkeye’s bat tracking feature has yet to be rolled out by Hawkeye. This means nobody (to my knowledge) is currently in possession of the kind of paired bat and ball tracking data described in this article more than three years after it was first mentioned publicly. What’s worse, it appears this data is not likely to be delivered to front offices any time soon. (important update in footnotes)**

The lack of opportunity to analyze this data was one of the reasons I decided to pursue work in other industries. As a result of that decision, I am comfortable finally sharing this information with the community:

I believe that the teams who can best utilize bat tracking data when it becomes available have the chance to gain a major competitive advantage over their competition. In my mind, it is the highest upside opportunity for teams that I am aware of at this time.

I once heard a quote attributed to Astros GM James Click:

“80% of baseball happens in the strike zone.”

I tend to agree with the quote and have to wonder, is 80% of baseball analysis related to what is happening in the strike zone? In my opinion, the answer is “obviously not.” There is a misalignment.

I believe that the insights waiting to be extracted from a paired bat and ball tracking dataset will improve our understanding of the batter/pitcher interaction significantly, helping us to find value in an under-studied part of the game: what is happening in the strike zone. The leap in knowledge may even be greater than when ball tracking data was introduced and utilized which brought us to the industry’s current state: a deep analytical understanding of pitching compared to a far less developed analytical understanding of hitting.

Teams are relying on analytical information in decision making more than ever, but the majority of tracking data available to them is related to the ball/players and therefore related to pitching/defense. In my opinion, this information imbalance is absolutely contributing to the rise in strikeouts and disappearing offense that characterizes the modern era of baseball (and that the league so desperately wants to change).

It is my hope that Hawkeye is spending its time prioritizing accuracy and precision in the development of this new feature. Like with all existing tracking data, any bat tracking information must be highly accurate to be useful. This is likely the main reason for the delay of this feature’s rollout, if I had to guess.***

Even though the data is not currently available, I believe club R&D departments could benefit from thinking about how bat tracking data could be used to their advantage so that they can hit the ground running when it is finally delivered. If a team was really serious about developing a competitive advantage, they could even consider creating a method of collecting paired bat and ball tracking data in-house and reap the rewards of the new dataset before any other team can get off the starting blocks.

Opportunities for significant and sustainable competitive advantages do not come around very often in baseball but for me, that is what analytics is all about. In this space, I love chasing the kind of ideas that could be worth dozens of wins before everyone else catches up. I believe studying the game at a fundamental level is the best way to cultivate those big ideas. We undeniably benefit from tracking the ball and the players, and we will undeniably benefit from tracking the bat. The question is, who will be prepared when the time comes to utilize the new data?

We do not need to accept that hitting analytics will always lag behind. We do not need to accept that we can only think about problems for which we currently have data. We do not need to accept that baseball analytics is now only about making incremental improvements to existing processes. Let’s think bigger.

Baseball, and baseball analytics, will look very different in 10 years. Previous periods of analytical change allowed the Astros, Cubs, and Dodgers to rise to periods of sustained dominance. Teams should always be hunting opportunities for them to do the same the next time around.

Our inability to simultaneously track the path of the bat and the ball is an immense missed opportunity to improve our understanding of the game. What are we going to do about it? The first step is to give the problem and the solution the attention it warrants. And if any teams need help with taking that step, my DMs will be open. :)

Thank you to Robert Riggins for informing aspects of this theory over many years of great discussions.

Footnotes:

*If there is interest, I may put out another article at a later date that gets into the weeds of what this data might actually look like and which specific metrics I would want to analyze in addition to the raw data.

** Edit: Since publication, I have learned that most (if not all) teams are receiving bat tracking data from Hawkeye. However, it seems that it is being delivered as raw data that requires each team to make a significant investment of time and resources to convert it into a more usable format. Teams must individually decide whether or not to prioritize this task over everything else on their plate. Until they do, it is unlikely that this data will be very actionable for them. With this new understanding of the situation, it seems there is not so much a race to acquire bat tracking data, but to find and use the insights hidden within.

**Some readers may be wondering about Blast Motion, the current industry leader in tracking the path of bats during swings. I want to clarify that I only see value in tracking bat position when paired with ball position data. Blast Motion tracks the position of the bat but not the ball. Unfortunately, because this data must be extremely precise, I believe it to be nearly impossible to pair Blast Motion bat data with Hawkeye ball data. I encourage any brave souls to try it out but, from experience, this Herculean task may not be worth the effort.

If Blast Motion bat tracking data is truly not pairable with ball tracking data, it is rendered worthless for providing the kind of insights I envision coming from Hawkeye paired bat and ball tracking.

Follow me on Twitter @Moore_Stats to be updated about future posts!

--

--