Baseball

Sam Serio
On Information Science
3 min readNov 9, 2017

Over the past few decades or so, there has been a war brewing in baseball between the “nerds” and the “jocks”. This isn’t necessarily news to anyone who has even a slight interest in baseball, and the broader sports world has started to see it infiltrate their own sport, whether it be basketball or football. The new philosophy, comprised of a bit of statistics, probability and game theory, is turning sports, an age old contest of athleticism, IQ and will, on its head.

Even though numbers are driving more decisions in football and basketball, it will never be as popular as it is in baseball, for a fairly simple reason. Baseball is a statistician’s dream due to its finite amount of outcomes. A batter (generally) can either get a hit, strikeout, or bat into an out. Runs can only be scored certain ways. Everything on a baseball field can be more or less represented on paper. This has not only driven the future of baseball, but has also helped cement the rich past of baseball as well.

Thanks to websites like http://www.retrosheet.org/, you can look up the box score of a game between the Red Sox and the Indians at Fenway on Thursday, June 9th, 1938. On this day, Jimmie Foxx hit his 18th home run off of Mel Harder in the 4th inning with 1 person on base and 2 outs. The amount of information that can be transcribed from a baseball diamond onto paper is astounding and allows for near perfect records from inconsequential games on a June Thursday 80 years ago.

What might be more exciting is the ease of access to all of this data and the ability to retrieve and analyze it. In retrosheets, play-by-play logs are available for years dating as far back as 1921 http://www.retrosheet.org/game.htm. Game logs exist for seasons starting at 1871 http://www.retrosheet.org/gamelogs/index.html. For batting and pitching statistics, Sean Lahman has a massive database available in CSV or SQL. These detailed and extensive records allowed for a baseball revolution to take place.

As the revolution continues, it starts to become the status quo for baseball teams around the country. Every team now has at least some interest in statistics and the quantitative aspects of baseball. The league-wide adoption of these ideas have driven many teams to go further for statistics. Instead of observing the outcome of a play in baseball, there has been a push for metrics on what goes into a baseball play: the speed of the ball off the bat, the exact positions of a fielder, the rpm’s on a pitcher’s curveball and more. This has caused an increase in tech in baseball, pulling the nation’s past time into the 21st century. Statcast and PitchFX compile data on the movement of a pitch, the launch angle of a home run, and other factors that go into baseball. This allows teams to use the fundamental aspects of a player to predict how many homeruns he’ll hit or bases he’ll steal. This is important, since it takes a variety of outside variables out of the equation. If the player is hitting the ball with a high exit velocity and a good rate of contact but it still getting out, it could be a case of just bad luck that he is hitting it right to the fielders instead of him being a bad hitter.

--

--