In-depth Analysis of Major League Baseball Shift: Effectiveness and Implications

Ethanweber
7 min readDec 16, 2023

--

Abstract:
This report presents a comprehensive analysis of Major League Baseball (MLB) data spanning two decades to investigate the effectiveness of defensive shifts. The study utilizes a combination of clustering, regression, and other analytical models to draw conclusions on whether the shift strategy employed by teams provides a tangible advantage. The findings suggest that while some players exhibit marginal success against the shift, there is no substantial evidence supporting the overall effectiveness of the shift for defenses.

Introduction:
If you are unfamiliar with the shift in the MLB, it is where the defense intentionally crowds one side of the field with the assumption that the batter tends to hit to that side more often than not. This is generally used against left handed hitters who have a tendency to hit the baseball to the pull side (right field). The down side of this shift is that it leaves the other side of the field wide open. It has been long debated whether or not the shift is beneficial for the defense so I want to find an answer to that question. The use of the shift began in the 1920s but for this project I will be using data from the start of the 2020 season to the end of the 2023 season. Over the past two decades, defensive shifts in Major League Baseball have become a prevalent strategy employed by teams to gain a competitive edge. This report delves into a thorough examination of MLB data and statistics to evaluate the impact and success of defensive shifts on the field. My target audience for this analysis is the MLB coaches and scouting department as well as fans that can gain insight from this work. By fully understanding the shift and weighing the pros and cons teams can better prepare for success in the major leagues.

Methodology:
The research process involved the collection and analysis of 23 years worth of MLB data, including player performance metrics and the defensive strategies used across the league. The primary analytical tools employed in this study include clustering and regression models, offering a multifaceted approach to understanding the dynamics of defensive shifts.

Data Collection and Cleaning:
2020 saw a spike in usage of the shift and I think that it is beneficial to use data as current as possible in an ever changing sport like baseball. For my data collection I scraped data from a website called Fangraphs. Fangraphs has a collection of all time major league baseball data in the form of various splits. I used this splits tool which allowed me to create my own custom reports by combining splits of various metrics. After some time playing around with the splits I formed my preliminary data set. This is not a pre filtered data set rather a collection of all major league baseball data where I chose variables and applied the splits to narrow down the data slightly. It took some time to decide which variables I wanted to include in this project. First and foremost I only took data where the shift is being used. There are traditional and non-traditional shifts in baseball and I had the option to choose between the two. I ended up including both traditional and non-traditional in my dataset because I think both encompass the idea of moving the defense to try and better defend the batter. I also only took data where the ball was put in play. Strike outs, walks, intentional walks, and hit by pitches don’t give me any usable information so I was able to filter that out. This will mean that the averages of the players will be higher since I am essentially looking at Batting average on Balls In Play also known as BABIP. Strikeouts are a big part of the game so this eliminates a big chunk of the data. My data is 3 files that I have combined into one large data set. They are each 2710 elements large and have a byte size of 456KB 229KB and 391KB respectively. The first data set has the more recognizable statistics like batting average, slugging percentage, OPS and runs created. The second and third data sets include balls in play statistics like FB% (Fly ball percentage), IFFB% (In field fly ball percentage), IFH% (In field hit percentage), BUH% (Bunt hit percentage) among others. It also contains data on the direction the ball was hit so I know if it was hit into the shift which is the pull side or if the ball was hit into the opposite field. I also have statistics that show how hard the ball was hit. This dataset was meticulously curated, encompassing player statistics, game outcomes, and instances of defensive shifts for the best possible results. I am confident that all of the relevant performance indicators were included. This extensive dataset laid the foundation for a comprehensive exploration of the impact of the shift on player performance. For the rest of the data cleaning I have used my python and SQL skills to first combine all of the data. I used merge and join functions in SQL to connect the 3 tables on the PlayerID. Next I got rid of any N/A values from the data set using the pandas skills acquired in this class and previous coding classes. I still had a lot of data and wanted more representative statistics so I also set a minimum amount of plate appearances facing the shift of 20. This eliminated about 10% of the data which I determined was useless. Lastly I removed certain columns that were not necessary due to the splits I chose. This includes values like strikeout percentage and walk percentage which are both in the dataset but have a value of 0% for all players since I am looking at data where the ball was put in play.

Clustering Analysis:
Clustering was utilized to categorize players based on their performance against the shift. This approach helped identify patterns and distinguish between players with varying degrees of success when facing the defensive strategy. My goal with the clustering was to see if some players have greater success against the shift than the rest of the league. I utilized kmeans clustering So I could choose the number of clusters to get a baseline of what these clusters will look like and the insights I will be able to gain from them in the future. The preliminary clusters were only done on the statistic batting average in balls in play (BAPIP), where I was able to determine that there is a select group of players that have more success against the shift. However, most of these players had fewer plate appearances than the rest of the data. I think that there are 2 reasons for this. Either these players were good enough against the shift that teams stopped using it against them, or the sample size is too small. After this realization I did some work in python to make a minimum number of plate appearances of 30 and then increased the number of clusters and was able to create more distinct and useful clusters for this data and apply it to other meaningful statistics in my data set.

Regression Analysis:
Regression models were employed to examine the correlation between defensive shifts and offensive performance. The analysis considered various factors, including how hard the ball was hit, game situations, and the results of the at bat, to uncover nuanced relationships within the dataset. This was a great indication if the defense actually works in stopping batters from getting on base and how it affects other stats like OPS and WAR. I was able to find that the shift did result in slightly worse statistics for the players on hard hit balls as opposed to hard hit balls with no shift meaning that the shift took away some hits from batters but it was not significant enough because of the fact that weaker hit balls resulted in higher OPS and WAR when the shift was used because one side of the field is wide open and players don’t necessarily need to hit the ball hard to get success if they hit to the opposite field when the shift is in play.

Overall Results:
This extensive analysis revealed intriguing insights into the impact of defensive shifts on player performance. While some players demonstrated marginal success against the shift, the overall evidence did not support the notion that defensive shifts provide a substantial advantage for teams. Certain players exhibited the ability to navigate through the shift successfully, as evidenced by sustained high performance metrics. These players were generally proficient at hitting the ball to the opposite field where the shifts weakness lies. However, this success was not consistent across all players, suggesting that individual skill and adaptability play a significant role.

Clustering Patterns:
Clustering analysis identified distinct groups of players with similar responses to the shift. This segmentation highlighted the variability in how different players fare against the defensive strategy, emphasizing the need for a nuanced approach to defensive tactics.

Regression Findings:
Regression models demonstrated limited correlation between the implementation of defensive shifts and overall offensive performance. While certain factors such as game situation, and ball speed off the bat influenced outcomes, the impact of the shift itself appeared to be less significant.

Conclusion:
In conclusion, the analysis of over two decades of MLB data suggests that, despite the occasional success of individual players against the shift, there is no substantial evidence supporting the overall effectiveness of defensive shifts for teams. The nuanced nature of player responses, as revealed through clustering and regression analyses, underscores the need for a dynamic and player-specific approach to defensive strategies rather than shifting just based on if the batter is left handed.

Implications and Audience:
The findings of this study have implications for both teams and the league as a whole. Teams may need to reconsider the blanket application of defensive shifts and explore more tailored approaches based on individual player tendencies and this starts with the coaches and scouts of these teams. Moreover, the league could use these insights to refine rules or strategies to maintain a balance between offensive and defensive dynamics in the game. This has already begun this past year as the rules of when you are allowed to shift are beginning to change.

Limitations and Future Research:
It’s essential to acknowledge the limitations of this study, including the dynamic nature of the game and the potential evolution of defensive strategies. Future research could explore real-time player adjustments to shifts, incorporating additional contextual factors, and considering the impact of evolving player skill sets over time. This report contributes to the ongoing dialogue surrounding defensive strategies in Major League Baseball, providing valuable insights into the nuanced relationship between shifts and player performance. As the game continues to evolve, a data-driven understanding of these dynamics becomes increasingly crucial for teams seeking a competitive edge on the field.

--

--