Good pitchers are hard to predict, and good machine learning predicts, right? Inspired by this post, we set out to see just how well we could get a simple neural network to predict the next pitch in a sequence. Our suspicion is that predicting pitches is inherently sort of hard, as surprise and timing are what gets a batter off rhythm. That’s why the previously linked post, which successfully predicts about 50% of pitches using a decision tree ensemble model, was especially surprising to me. It turns out that, even with a lot of data and a lot of computing power, you can still only predict the next pitch at around 50%. That, in itself, is interesting, but maybe not as valuable as something that modeled pitching a bit more broadly. This has several potential benefits:
- Batters would benefit from having a better guess at what comes next.
- Pitchers would benefit from knowing what they would most likely throw next, so that they might keep a batter surprised.
- People who make baseball simulations and games would benefit from having better models of who pitches what when.
- It is fun to play with baseball statistics.
As we are academics who, in part, study and teach both games and machine learning, it’s probably mostly #4. The fact that we are academics also means that we will have to give our (simple) method an acronym: MAPSIK (Modeling Pitching Sequences in Keras).¹