Thoughts on Big Data Baseball

I finished reading Travis Sawchik’s book Big Data Baseball: Math, Miracles, and the End of a 20 year Losing Streak, which chronicles the adoption of the Pittsburgh Pirates Front Office’s of an analytics based approach to baseball. The book was overall very well written — which I attribute to Sawchik’s storytelling ability — and it’s definitely worth your time. There are some interesting thoughts that I picked up on the overall landscape of the intersection between analytics and sports that I felt were important.

Here are some of my thoughts:

  1. Billy Beane’s Shit Doesn’t Work Anymore:

If one where to read Moneyball by Michael Lewis today, and use the same approach that the Athletics used in 2002, they’d have a hard time finding a high OBP player on the market at a below market price. Nearly all 30 teams love on base percentage, and they will pay a premium for a player who can get on base at a high clip. Any Free Agent with an OBP of .360 or greater is going to get paid a lot of money. The market quickly adjusted to the inefficiencies exposed in 2002, and teams have had to look for new inefficiencies.

The baseball market is fluid, and what was undervalued — even a year ago — may not be undervalued now. It might even be overvalued.

2. Ironically defense which Beane felt was overvalued in 2002, is what the Pirates found was undervalued in 2013:

The main approach to rebuilding the Pirates through data was a complete investment in defense. Neil Huntington hired former Baseball Prospectus blogger Dan Fox and MIT Grad Mike Fitzgerald to build out the Pirates own proprietary database. These feeds included spray charts that tracked the hit placement tendencies of every hitter in the league. This led the Pirates to rely heavily on the shift — repositioning defenders based on the likelihood that a hitter will hit a ball to a certain part of the field — and therefore invest heavily in defense.

The strategy of shifting was integrated fully in their approach to the types of players that they were putting on the field. Their pitching staff was composed of pitchers who were not flame-throwers — outside of A.J. Burnett and Gerritt Cole — but rather guys who could induce high ground ball rates. Ray Searage, the Pirates’ pitching coach, helped Jeff Locke and Charlie Morton develop a two-seam fastball that spiked their ground ball rates. When you combine pitchers with high ground ball rates and a defense that is aligned based on where a hitter is most likely to hit a ground ball you end up with one of the best defenses in baseball.

3. Fully Integrating Your Approach is Important:

The Pirates understood that defense was undervalued in the free agent market, in terms of the number of shifts, and in terms of in-game strategy. However, they didn’t go halfway when it came to adopting a defense-based approach. They wouldn’t have been successful had they adopted shifts and called it a day. They needed pitchers who would get the most out of their infield shifts. They needed their infield shifts to get the most out of their pitchers. Locke and Morton aren’t particularly good pitchers, and the Pirates didn’t have Syndergaard, deGrom, and Harvey hanging out in the farm system. Nor did they have the resources to go out and sign a Justin Verlander caliber pitcher. However, they had an approach that made their pitching good enough to get by. A lot of ground balls, shifted defenses, and a really good catcher.

Rather than spending $125 Million on Zack Greinke — like the Dodgers did — the Pirates spent $17 Million on Russell Martin. Investing in Martin’s pitch-framing abilities with the hope that Martin could steal strikes for their pitchers.

The shifts, the ground ball rates, and the pitch framing indicates that the Pirates did not go halfway on their approach, they fully integrated their commitment to making it as hard as possible for their opponents to produce offensively. That’s part of the reason they’ve been successful, and I think some other clubs should take notes. Teams that are really strong in one area such as pitching, maybe haven’t gotten the most out of their pitchers because they’ve ignored defense. I’m looking at you, New York Mets.

4. Communication is just as important as the data

This to me is the most important observation that I took away from the book. It’s an idea that I’ve been kicking around a lot over the past year about the role of data in our society. We’ve shifted towards a data-based approach to everything, and for the most part I think that’s a good thing. We use data to build better sports teams, run government and business more efficiently, and run better campaigns. Data-journalism is on the rise, just look at how many people cared about FiveThirtyEight projections this election cycle versus 2012.

However, no matter how insightful and accurate your data is, I think that your communication of that data has to be twice as good. It should be just as easy for the farmer in Nebraska to understand the insights or projections that your data is producing as it is for a MIT Quantitive Analyst to understand.

This is something that the Pirates struggled with and then eventually addressed. Their analysts, Fox and Fitzgerald were so seeped in the numbers that their data was producing they realized that the Pirates coaching staff had trouble (1) understanding that data and (2) communicating it to players in a easy-to-digest way. The Pirates quickly addressed this issue by investing in TruMedia’s data visualization software. Rather than providing raw data, the Pirates coaching staff was able to provide a visualization of the spray charts, and communicating a more concrete reason for why the infield had to be shifted. Baseball player’s naturally have a higher visual IQ, and they needed a visual representation of the data rather than a numerical representation to understand what the data was saying. In the player’s daily briefing, the Pirates staff made an effort to provide the data as well as a anecdote of that data in action.

I think this is a perfect example of how we need to make sure that people understand the data they are being presented with. I don’t believe that people are incapable of understanding what data is saying, I think that it sometimes needs to be communicated in an understandable and relatable way. It’s not a matter of ignorance or stupidity, it’s simply a matter of communication. This will be the great challenge of the data-driven age; making sure that our communication of data is twice as good as the data itself.