History of Baseball Statistics
Baseball has undergone many changes since its first professional league was founded in the late 1800s. One such change includes the use of statistics– such as the traditional triple-slash line– which is the means by which player performance is recorded and evaluated. Coined as sabermetrics by one of its ‘fathers’, Bill James, baseball statistics has evolved with the game itself in finding its unique place within America’s greatest pastimes.
Its applications have been ever-expanding in the technology-laden society of today. For example, as video analysis of player performance allows us to track data incomprehensible in the past, coaches can more efficiently and productively develop young players. The advent of advanced sabermetrics allow more accurate assessments of players and their future potential for scouts to evaluate and recruit. The internet allows easy access to a plethora of statistics within a click’s reach, educating more fans about the intricacies of the game. Front offices make use of sabermetrics when signing free agents or trading for players.
With such a huge impact in various fields of the industry, baseball statistics is integral to understanding the sport today. Therefore, to look into the development of baseball statistics is to look into the development of the game itself.
The development of sabermetrics into the influential concept that it is today took place in two steps: breaking away from traditional statistics and collecting big data from modern technological developments.
Finding Flaws in Traditional Statistics
Baseball statistics have existed for as long as the game itself. However, early sabermetricians, including James, have identified flaws within traditional statistics that they rectified in order to gain better understanding of player performance and value.
Such flaws existed in statistics such as the triple-slash line, one of the most well-known and long-standing statistics in baseball. It is known to be an aggregate of a batter’s hitting abilities summarized into three statistics: batting average, on-base percentage, and slugging percentage, in that order. Its prominence was a result of its simplicity– it was easy to understand and thus more approachable to fans.
The most prominent of the three components of the triple-slash line is the batting average, which is the number of hits divided by the number of at bats and interpreted as the chance of getting a hit at any given at-bat.
However, these numbers are not comparable. For example, what does a batting average of .250 mean? Can it be concluded with certainty that a batter boasting a batting average of .300 is better? These comparisons are not necessarily appropriate, since although logically sound, the batter with a .300 average may have had considerably less at-bats. For example, a player with 6 hits in only 20 at-bats has a .300 average. Based on this number solely, however, we cannot determine that the .300 batter is better than another player with 25 hits in 100 at-bats, whose batting average will be .250. This is because the .300 player has had considerably less at-bats, meaning that he will likely regress towards his true ‘mean’ ability as he steps up to the plate more times.
Also, the era in which a player has played must be considered. Although a .250 batting average may sound subpar, the figure is substantially better than league average in the 1968 season, where the average batting average was only .230 in the American League. In fact, when comparing one batting average to another– relative to the “average” batting average from the respective year– a .250 batter in 1968, when assessing their performance with regards to batting average only, performed just about the same as a .329 batter in 1930. If these two starkly different numbers mean the same thing, what use is the batting average as a measure of comparing two player’s abilities of performance?
Another statistic found in the triple-slash line is the slugging percentage. Denoted as SLG, slugging percentage is the total bases a batter covers divided by his number of at-bats. Total bases is the total number of bases that a player has ‘touched’, or gone through. It can be calculated by taking the number of singles, doubles, triples, and home runs, and multiplying it by its base value: one, two, three, and four, respectively.
Although the SLG may seem like a better evaluation of a player than the batting average with its ability to incorporate power, it again faces one crucial limitation in its logic: it assumes that a double is worth two singles, that a triple is worth three singles, and a home run is worth four singles. However, a home run in four at-bats is worth considerably less than four singles, since the four singles have a higher run potential. While the batter with the homerun may have driven in one or two runs, he has also made three outs. On the other hand, the batter with four singles constantly provides his team with a chance of scoring a run.
Flaws exist not only in traditional batting statistics, but also in traditional pitching statistics. While the win-loss record, which is also known as W-L record, has been under scrutiny for a long time for its dependence on a pitcher’s team, not on his true abilities, statistics such as the earned run average, abbreviated to ERA, have only been disputed by insiders since the sabermetric age began.
The reason sabermetricians have shied away from using the ERA metric is due to its variability. One of the more groundbreaking recent theories in sabermetrics is the Defensive Independent Pitching Statistics (DIPS) theory, which suggests that once a ball is in play, a pitcher has very little to do with the outcome of the play. While the ERA attempts to distinguish between runs made under the pitcher’s responsibility from ones made as the defense’s fault, the distinction fails to accurately achieve its goal due to the limitation in assessing a pitcher’s responsibility as suggested by the DIPS theory. Runners who score from defensive errors are definitely not a pitcher’s responsibility. However, there are often more obscure plays that are not a defensive error by definition, but could have been made by a more apt defense. The failure of ERA in accounting for such obscure plays prevents it from being an accurate measure of pitcher performance.
Traditional defense metrics such as the error also have fundamental issues that keep them from accurately measuring player defense. Defensive aptitude has, for a long time, been measured through fielding percentage, which is calculated by finding the percentage of total fielding opportunities that a fielder has successfully converted without an error. However, the concept of errors is assigned subjectively by the scorekeeper — in other words, because the scorekeeper judges whether or not a missed batted ball in play is an error, the idea of an “error” is susceptible to variability.
As sabermetricians began to realize that traditional statistics such as the triple-slash line were inadequate in evaluating players, they worked on devising new models. These models seek not only to better evaluate an individual player’s contributions to his team, but also to measure or project a team’s performance.
Born in 1881, Branch Rickey was a baseball executive best known for signing Jackie Robinson and breaking the long-standing color barrier in baseball. In addition to bringing the first African-American player into the MLB’s fold, Rickey also devised a formula that assesses a team’s efficiency. Denoted as G, it represents the difference between a team’s ability to score runs and ability to prevent runs. These aspects were determined by the sum of individual statistics such as walks, hits, runs, earned runs, strikeouts, etc. and looks like this:
While the formula may look daunting at first, it simply sums up the factors that lead to increased runs and detracts the factors that lead to fewer runs.
With this equation, Rickey has stripped the game to its most fundamental aspect: the number of runs. In the end, the team that scores more runs than they allow their opponents will win more games in a season. Also, by looking at the aggregate of individual statistics, Rickey’s team efficiency formula controls for confounding factors and leaves no room for lurking variables.
The idea of correlating runs to wins was also adapted by Bill James, who came up with a formula now known as the Bill James Pythagorean:
This simple formula predicts a team’s win-loss percentage at most points during a season with strikingly high fidelity. It looks at two metrics only, runs scored and runs allowed, and is consistent with the aforementioned logic of using runs as the unit for measuring performance. The equation also manages to predict whether a team will regress in the future by comparing the expected win-loss percentage to their actual win-loss percentage. For instance, if the former is greater than the latter, then the team will be expected to perform better in the future.
This logic of measuring performance in terms of runs is applicable not only in predicting overall team performance, but also in measuring an individual player’s performance. A formula that epitomizes the assessment of an individual player’s value is the Runs Created equation, Bill James’ most widely known statistic. The equation reads:
The formula is rooted in the assumption that offensive ability is either an individual’s ability to get on-base through walks and hits or the ability to move runners around. Like Rickey, James has measured a player’s contribution in terms of his individual batting statistics, the aggregate of which will lead to the total runs he provides his team.
In addition to batting statistics, sabermetricians have also sought to deal with inefficiencies in traditional pitching metrics such as the ERA. For instance, sabermetrician Ted Oliver invented the Weighted Rating System in 1944. The formula is:
[(W-L%Pitcher)-(W-L%Team without Pitcher)] x Pitcher Decisions
[(W-L% of Pitcher) — (W-L% of Team without Pitcher)] calculates how much likelier a given pitcher will hand his team a win compared to any other pitcher in the team’s starting rotation. By multiplying [(W-L% of Pitcher) — (W-L% of Team without Pitcher)] by the number of decisions, the result is the number of wins a pitcher accounted for that the average pitcher would not be able to account for.
Convergence of Technology into Baseball
Books such as The Hidden Game of Baseball, written by Pete Palmer and Dick Cramer, who had pioneered the sabermetrics movement with James at the Statistical Analysis Committee, claim that the statistics derived by early sabermetricians such as James, Oliver, and Rickey are the solutions to issues in traditional statistics.
As the computer became more of an everyday commodity in the 1960s, however, statisticians started collecting computer-driven, computer-generated data that cover a wider range of games and a specific portion of a single game. What people now call big data would not only help sabermetricians conduct their research on baseball statistics more accurately, but it would also give rise to previously unfathomable data that led to new statistics. Said John Dewan, founder of Baseball Info Solutions (or BIS):
[Technology] made me want to appreciate what the best players are really worth, because the eye can be deceiving as with anything.
Indeed, while scouting may sometimes rely on subjective analysis, statistics are numbers that don’t lie: they provide objective means to evaluate players.
In his 1984 publication of the Bill James Baseball Abstract, James called for a network of enthusiastic fans come together in building “Project Scoresheet,” an online database that collects play-by-play data from every Major League game. Said James:
When Project Scoresheet is in place, all previous measures of performance in baseball will become obsolete and an entire universe of research options will fall in front of us.
Turning this idea into reality, Project Scoresheet would, by 1994, go on to collect play-by-play data for 23,000 games for a total of 1.7 million plays. With the availability of play-by-play data all in a single database, statisticians could assign al run-value, or the average number of runs generated, to every play occuring on the field. In essence, this information paves the path for more accurate weighing of formulas such as Bill James’ Runs Created
In their widely acclaimed book, Palmer and Cramer points to Linear Weights as an example of the effects big data had on baseball analytics.
By taking in data from multiple seasons, an empirical measurement of what a walk, strikeout, fly-ball, or a double is worth in terms of a given game’s actual runs can be obtained. Linear Weights is the summation of all hitting, fielding, pitching, and even baserunning metrics of the game weighted in terms of these run values. This summation thus translates to four separate equations, each of which addresses the four main aspects of the game of baseball: hitting, fielding, pitching, and baserunning.
To calculate the number of runs a player contributes to his team through batting:
Runs= (.46)1B + (.80)2B + (1.02)3B + (1.40)HR + (.33)(BB+HB) — (.25)(AB — H)
The equation above tells us that a single will, on average, produce .46 runs, a double .80 runs, a base on ball and hit by pitch .33 runs, etc. AB — H represents missed opportunities, since at-bats that did not result in a hit are times when batters failed to either put themselves on base and/or move baserunners around. This involves batters striking out, hitting a fly-ball, or hitting into a double-play.
To calculate the number of runs a player contributes to his team through baserunning:
Runs= (0.3)SB — (0.6)CS
The uneven weighing of the steal and the caught stealing is due to the following reason: while a stolen base may marginally improve a team’s chance of scoring, a runner caught stealing will completely squander that potential.
To calculate the number of runs a player contributes to his team through pitching:
Runs= Innings Pitched x (League ERA/9)– Earned Runs
The calculation for pitcher’s linear weights factors in not just the number of runs a pitcher allows, but the number of innings he pitches. This gives starting pitchers, who pitch more innings than relievers and thus generally have a higher ERA, a fair representation of their value to a team.
The formula for defense is more complicated than the other formulas mentioned previously. Since the defensive requirements are starkly different for a shortstop, catcher, first baseman, or an outfielder, different position players must be evaluated separately.
We compare the average linear weights for a certain position in the league through this formula:
where AVG. pos. lg is the average number of runs saved by the average fielder in a given position, thus calculating a team’s defensive runs saved per position:
Runs= .20(PO + 2A — E+ DP) team at position — AVG. pos. lg. x (PO — K) team
To calculate an individual player’s defensive runs saved at a certain position for a certain team, multiply the respective team’s total runs saved at that position– which is calculated by the formula above– by the percentage of the team’s putouts that the player accounted for himself.
Not only did the technological revolution in baseball statistics lead to more accurate models of player performance and value, but its use of data also shed light on the previously unknown value of various unorthodox baseball strategies.
An example of the data bringing new strategies onto the baseball field is Dewan and his work at BIS. Dewan was the project manager of James’ Project Scoresheet database, and from this experience, founded BIS, a company that specialized in collecting and analyzing play-by-play data of defensive plays.
For example, Dewan’s plus-minus metric was largely derived from manual video analysis by BIS employees. Running through every defensive play in a game would lead to a measure of how many batted balls a fielder managed to stop or catch over an average fielder at his position. Though it initially seems as subjective as the error, this plus-minus is based on an algorithm that plots balls in play in a coordinate system. In sum, plus-minus thus allows teams to better understand how many runs a fielder has saved defensively over any given period of time.
With this data, BIS revealed the previously unquantifiable value of defensive shifts. Until BIS published research suggesting that teams should employ more shifts, defensive shifts rarely occurred.
Another example of a strategy whose value was uncovered through technological developments is pitch framing: the ability of catchers to manipulate the perceived location that a pitch crosses the plate. After the success of James’ Project Scoresheet database and BIS’s video analysis, technology began converging with baseball at an astounding rate. Two of the more recent developments in baseball technology occurred with Pitchf/x and Statcast, live baseball tracking technology that was released in 2006 and 2015, respectively. StatCast, the most recent technological development, had brought data such as launch angle and exit velocity, which were used by players to adjust to an optimal swing in the 2017 season.
On the contrary, Pitchf/x measures play-by-play data of every pitch, including its speed, type, location, and result. While initially designed for ESPN to implement a digital strike zone in its games, sabermetricians such as Dan Turkenkopf realized that this technology could be utilized to assess the effect, if any, of an individual catcher in inducing a strike call for a pitch that is actually a strike.
The results were astounding. As Turkenkopf stated:
I’ll be the first to admit this is a much larger effect than I expected to see. In fact, it’s so large that I have to think that there is something wrong in the analysis.
But, of course, he and others found nothing wrong. Turkenkopf had uncovered a hidden gem that is now commonly called pitch framing.
The difference in the number of runs saved or lost between the best and worst pitch-framer was unbelievable. When looking at the number of runs saved per 150 pitches in 2007, the best catcher, Gregg Zaun , saved 0.85 runs while the worst catcher, Gerald Laird, lost his team -1.25 runs, for a staggering 250-run difference between the best and worst catcher over a 120-game span, which is around the average number of games a MLB starting catcher will catch in a season.
Conclusion
Baseball statistics have developed a lot over the ages. While early baseball statistics were only meant to recount what had happened, sabermetricians have realized that statistics could be used to predict what could happen in the future. Thus, the late 1900s saw a wave of new statistics entering the game, all of which were in an effort to constantly make projections of player value more accurate. From the works of sabermetricians such as James, Dewan, Rickey, and Palmer sprout the foundation for teams to adopt similar ideas and strategies when constructing their rosters or making in-game decisions. Just as how sabermetricians identified flaws in the then-current methods of player evaluation and constructed new statistics to address the known shortcomings, front offices similarly exploited inefficiencies in both in-game and front office strategies. The applications of such improvements have made baseball a better sport– not only are fans more aware of how valuable one player is to another, but front offices are also increasingly more efficient in spending their money on players. These benefits prove that studying baseball statistics and its history and investigating means to improve it in the future is a necessary and right path forward in the long, century- old history of America’s pastime.