Using Data Science to Analyze if MLB Players Should Bunt to Beat the Shift

When an MLB player bunts to beat the defensive shift, who really wins? The offense or the defense? I use data analysis and statistical probability tools to answer this important question.

Andrew Fenichel

Published in

Sports X Analytics

7 min readAug 31, 2020

A Seattle Mariners player attempting a bunt

Bryce Harper’s bunt is at 2:36

On August 12th, 2020, Philadelphia Phillies outfielder Bryce Harper bunted for a hit against the shift.

Now, this wasn’t the first time Harper has done this — in fact, since 2019, the former MVP has bunted for a hit six times. But it left me wondering — should a hitter as good as Harper really be giving himself up for a potential bunt single when he could have hit a home run on that same pitch? Who really wins when Harper does this, the Phillies or the shift?

Let’s use math to find out.

RE24 Run Expectancy Matrix (via FanGraphs)

To figure out whether or not bunting against the shift is a smart play, we need to calculate the expected runs for when a player swings away and for when a player bunts for a hit. To do this, I had to create a formula. This run expectancy matrix shows the expected number of runs scored between a given point and the end of an inning based on the overall run environment, the number of outs, and the placement of the baserunners given a league average pitcher on the mound and a league average batter in the box.

Then, to calculate expected runs given a specific batter at the plate, I used the following formula: ((the probability that the player reaches first safely via walk, hit by pitch, or single) * (the RE24 of that future situation)) + ((the probability that the player hits a double) *(the RE24 of that future situation)) + ((the probability that the player hits a triple) *(the RE24 of that future situation)) + ((the probability that the player hits a homerun) *(the RE24 of that future situation)) + ((1 minus the probability that the player‘s on base percentage) *(the RE24 of that future situation)).

It sounds complicated, but it’s not — it’s essentially just combining the probabilities of everything that could happen in that specific plate appearance. So for Bryce Harper versus righties, if his walk + hit by pitch + single rate is 0.28, his double rate is 0.05, his triple rate is 0.002, his homer rate is 0.04, and his on base percentage is 0.375, then his expected runs with no outs and the bases empty versus a right handed pitcher is 0.49, which is 5.1% higher than league average (0.461).

Next, I needed to recreate this matrix for bunting against the shift. However, no player has enough bunt attempts versus the shift to warrant a large enough sample size, so l instead calculated expected runs for “the league average bunter vs the shift”. According to FanGraphs, on a given pitch, a league average bunter has a 49.6% chance of putting the ball in play, and once a bunt vs the shift is in play, the batter has a 58.1% chance of reaching safely. So using basic probability, if you give a hitter 3 strikes, the probability of reaching first via bunt is roughly 0.507. Then, similar to the swing away formula, you multiply the hit probability by the 24RE run expectancy and that’s the expected runs.

As a case study for bunting against the shift, I created a data set using the 10 most shifted pure lefty and righty hitters versus the shift in 2019 according to Baseball Savant, minimum 250 plate appearances. The data set includes a good balance of elite MVP-level hitters, like Cody Bellinger and Kris Bryant, and below-average hitters, like Mike Zunino and Chris Davis.

I calculated each hitter’s expected runs with 0, 1 or 2 outs, facing a lefty or a righty pitcher, and every base-situation except for any situation with a runner on third. I did this for two reasons — one, often teams will not use a standard shift with a runner on third, especially versus lefties, instead keeping the third baseman near third base to keep the runner at third from getting too large of a lead, and two, I do not have enough data to accurately predict what would happen to the runner on third on a bunt . The runner could score, get tagged out, or stay at third — who knows.

So…what’s the answer? Is it better to bunt or swing? Well, it depends.

The number one trend to note is that with no outs, there are very few players who are good enough to justify not bunting, especially when the pitcher has the split advantage. Outside of Texas Rangers OF Joey Gallo, who crushed lefties in 2019, the only player who should swing with no outs versus a same-handed pitcher is Cody Bellinger with a man on first. However, as you can see, the expected runs bunting drops way off with one and two outs. For example, with the bases empty and two outs, the only players that shouldn’t swing away regardless of pitcher are Mike Zunino, Chris Davis, Curtis Granderson, Brandon Belt, and Matt Carpenter.

The main reason for this is that while bunting has a higher expected on base percentage than swinging away, it has a much smaller range of outcomes, and therefore a lower chance for explosive scoring — think triples and home runs. And with fewer outs to work with, it’s harder to string together multiple hits in a row, so the potential for a home run becomes exponentially more important.

In this line graph showing xRuns for RH hitters versus RHP with the bases empty, the opacity of each line is that player’s home run rate. Players with darker lines have higher xRuns than the league average bunter with one and two outs, while Mike Zunino (light blue), who hit only seven homers in 188 plate appearances versus righties in 2019, doesn’t come close.

Again, I could make a multi-hour video breaking down every single data point, so if you’re interested in a specific player/ game situation, feel free to DM me on Twitter (@Andrew_Fenichel) with your question and I’ll get back to you.

The situations in which Phillies OF Bryce Harper should swing and bunt (number is xRuns swinging away)

Finally, back to Bryce Harper — based on his 2019 stats, Harper should swing in these situations (see left). So that bunt attempt on August 12th was a great decision by Harper — he was facing a left handed pitcher with a man on first and no outs, and he increased the Phillies’ expected runs by .015 by bunting instead of swinging.

But what if Harper is better than an above-average bunter? Over the last two seasons, he’s actually six for nine bunting for a hit, and while this is a relatively small sample size, it still allows for inference that Harper is better than average. Well, we can find the equilibrium point for any hitter, or the point in which the expected runs by bunting and swinging away are equal, and therefore how good of a bunter a hitter needs to be in order to justify bunting instead of swinging. To calculate this equilibrium point, we use old-fashioned arithmetic, setting xRunsBunt = xRunsSwing and then solving backwards.

Here’s an example: versus lefties with 2 outs and the bases empty, Harper would need to have a 67.81% successful bunt probability to justify bunting instead of swinging.

To bunt or not to bunt — that is the question. And while I began to answer that question today, the puzzle is definitely not solved. There are other parts of this equation that I have not investigated yet, such as how good the players on deck and in the hole are, how good the pitcher is, if the ballpark is a hitters park vs pitchers ballpark… the list goes on. Stay tuned for part two.

Follow Andrew Fenichel on Twitter: @Andrew_Fenichel

Connect with Andrew Fenichel on LinkedIn: Andrew Fenichel

Using Data Science to Analyze if MLB Players Should Bunt to Beat the Shift

When an MLB player bunts to beat the defensive shift, who really wins? The offense or the defense? I use data analysis and statistical probability tools to answer this important question.

Written by Andrew Fenichel