Let’s Find Out Why MLB Pitchers Give Up Runs!

James Droter
INST414: Data Science Techniques
6 min readSep 15, 2024

Question, Stakeholders, Decisions

As of writing, the Baltimore Orioles just wrapped up a 3-game skid against the Boston Red Sox and Tampa Bay Rays where they saw the combined score from those meetings come out to 21–3. Not exactly what you would call ideal. I want to specifically draw your attention to one of the latest games they played— September 9th, Monday, against the Red Sox. The Orioles deployed 5 total pitchers, with 4/5 giving up at least 2 earned runs (ER), and 2 others giving up 4 ER each. In contrast to the Red Sox, who only used 3 different pitchers with each having a different run total of 0, 1, and 2.

This game against Boston has a minuscule effect when contributing to the overall statistical landscape of the season for Baltimore, and the other 29 teams as well. With 162 games to be played, there is a lot of data that begins to paint a picture for the season. Why do I even bring up this stretch of games then? This leads me to the question being posed in this article, “Why Are MLB Pitchers Giving Up Runs”. A question posed by all stakeholders alike: Fans, teams, owners, and the baseball corporation. This article aims to increase the understanding of why runs just happen to appear on the scoreboard, how fans can understand their favorite team’s pitchers sometimes suck, and help organizations level expectations with the use of Baseball Savant metrics.

Data, Cleansing, and Collection

Baseball is one of the most quantifiable sports being played. Everything can be measured, tracked, assessed, and analyzed unlike any of the other sports included in the “Big 4"- NFL, NBA, NHL, and MLB. Baseball Savant got its start in 2015 when MLB introduced Statcast, a new way to track and analyze both player and ball movements through 12 “Hawk-Eye” high-speed cameras placed all around the 30 Major League ballparks. It has revolutionized the way the sport is viewed. Baseball Savant, the host of MLB’s Statcast metrics, contains data ranging from baseball counting stats like Games Played, Plate Appearances, Home-Runs, to the stats involving cameras being a part of the foundation of creating measurable analytics and enhancing predictions of future performance in players.

In addition to hosting metrics, Baseball Savant provides the service of downloading their leaderboards of statistics into .csv format for consumers to do exploratory data analysis to their heart’s content. Using a combination of their checkbox selection tool and dropdown menus, I selected pitchers with minimum 400 batters faced, to matchup with league leader qualifications. Some of the potentially challenges that users could face when making these selections would be the difference between Statcast and basic metrics, as they have separate weights of importance. I chose three statcast predictive metrics including Launch Angle (LA), Exit Velocity (EV), and Hard Hit Percentage (HH%), along with one sabermetric being On-Base + Slugging (OPS). These stats help measure all outcomes achievable on a baseball diamond.

Data Analysis

According to the MLB rulebook there are 23 ways to make an out, with only 8 possible ways to reach first base. This would seemingly shift in the way of all pitchers to be dominant when it comes to preventing runners from crossing home plate. Let’s take a look at some of the stats.

In comparison to the stats that will be covered in this article, Launch Angle is the objectively easiest concept to understand. LA represents the angle of the ball’s travel after leaving the contact of the bat. Grounds balls are categorized as 10° and lower, line drives as 10–25°, and fly balls as 25° and above. In isolation line drives and fly balls are a more dangerous type of contact to see as they have a higher batting average due to the inability to fielders to have a play on them. I will be performing four correlation tests with using Pandas, MatPlotLib, SciPy, and Seaborn. A correlation test between LA average and ERA is shown below:

R = 0.104

With an R correlation of 0.104, it does not matter whether the batter is popping the ball up or skidding it into the ground it will not affect the earned run average of a pitcher.

Exit Velocity is the other half for the base of predictive/expected stats. EV is defined as the velocity at which the ball travels after contact with the bat. The harder a ball is hit, the less time it spends in the air/ground, and being more challenging for fielders to make a play. A correlation test between EV average and ERA is shown below:

R = 0.287

With an R correlation of 0.287, there is a slight but weak effect of the speed of the ball off the bat when it comes to an earned run average of a pitcher. It is reasonable to see EV have a higher correlation than LA, in isolation LA is an empty stat.

Hard Hit Percentage is rounding off the statcast metrics and is very similar to Exit Velocity as they both deal with the speed of the ball upon contact off the bat. What separates the two is HH% is categorized as 95 mph or more. This difference in quality of contact can help decipher which players consistently make hard contact off the bat in comparison to players who make consistent weak contact. A correlation test between HH% and ERA is shown below:

R = 0.335

With an R correlation of 0.335, HH% is approaching a moderate correlation in affecting ERA. Considering the statcast metrics so far, HH% being the most correlated out of the group is plausible. More consistent hard contact will lead to more hits in the field.

On-Base + Slugging is the final of the four metrics covered and the only sabermetric, a term coined by Bill James to reference sports analytics. OPS takes the rate at which a player is able to reach base (league average being 0.312), combined with the rate at which a player is able to produce extra-base hits along with total bases per at-bat (league average is .400). OPS becomes a highly referenced stat in player comparison as it encompasses all aspects of what a hitter is supposed to accomplish. A correlation test between OPS and ERA is shown below:

R = 0.867

With an R correlation of 0.867, there is a strong correlation in a pitcher’s allowed OPS and their ERA. The higher level at which a hitter is able to reach based combined with their ability to hit extra base hits will inevitably lead to danger on the base paths.

Limitations, Summary

With the amount of public facing resources MLB has to offer, there are still plenty of data and tools teams still keep private from the public to preserve their methodologies from others. This leads to less available analysis for the consumer.

Although finding only one out of the four analyzed metrics to have a strong correlation on runs, the takeaway should not be “Wow, these stats really are useless!”. A huge part of baseball comes down to luck, many players will be performing above their expected stats, and on the flip side, many players will be underperforming their expected stats. It is just how the game rolls. If there is one lesson to walk away with after reading — if you’re team’s starting pitcher is rolling out with a low ERA and high opponent’s OPS allowed, buckle up. You’re in for some regression.

This GitHub Link will provide the code utilized for the analysis in this article

--

--