Matthew Berland
6 min readApr 28, 2020

--

Baseball (from Library of Congress: http://hdl.loc.gov/loc.pnp/pga.04028)

Good pitchers are hard to predict, and good machine learning predicts, right? Inspired by this post, we set out to see just how well we could get a simple neural network to predict the next pitch in a sequence. Our suspicion is that predicting pitches is inherently sort of hard, as surprise and timing are what gets a batter off rhythm. That’s why the previously linked post, which successfully predicts about 50% of pitches using a decision tree ensemble model, was especially surprising to me. It turns out that, even with a lot of data and a lot of computing power, you can still only predict the next pitch at around 50%. That, in itself, is interesting, but maybe not as valuable as something that modeled pitching a bit more broadly. This has several potential benefits:

  1. Batters would benefit from having a better guess at what comes next.
  2. Pitchers would benefit from knowing what they would most likely throw next, so that they might keep a batter surprised.
  3. People who make baseball simulations and games would benefit from having better models of who pitches what when.
  4. It is fun to play with baseball statistics.

As we are academics who, in part, study and teach both games and machine learning, it’s probably mostly #4. The fact that we are academics also means that we will have to give our (simple) method an acronym: MAPSIK (Modeling Pitching Sequences in Keras).¹

--

--