Of Baseball Analytics and Open Access: How Change Actually Happens

Open access advocates could learn a lot from baseball’s data crunchers.

As part of Slate’s 20th anniversary retrospective, Josh Levin has published a fabulous piece about the evolution of perspective within the baseball analytics community over the past decade. Let’s say you are (inexplicably) not a baseball fan. Even so Levin’s article is worth reading as a primer on what actually leads to change.

Open access publishing advocates, particularly the most strident and ideological among us, would benefit from a close reading of this history. Change making is hard in any domain, but it becomes easier if those seeking the change are willing to admit that they were not always right.

But before we get to open access let’s take a closer look at baseball analytics.

Moneyball Changed Everything: Michael Lewis published Moneyball in 2003. Focusing on the Oakland A’s general manager Billy Beane, Lewis popularized the concept of sabermetrics. This was the reliance on rigorous statistics and empirical records to evaluate a player’s worth to a baseball team. Per Beane and other sabermetricists such as Bill James, a player with a low batting average who can draw a lot of walks is sometimes more valuable than a player with the higher batting average. The only way to know is to look critically at the data, with no preconceptions.

To “old time” managers and scouts, this was poppy-cock. Data, schmata. You can tell the good players by the pep in their step, the curve of their swing and the look in their eye.

Sure, Billy Beane got the A’s pretty far one season — but they never did win the World Series, did they? (Well, at least not since the early 1970’s — long before sabermetrics came to the fore). Baseball was doing just fine before these weird stats guys came along, and would be just fine after they left.

And so the battle between the “jocks and nerds” was born. Levin expertly describes this tension. The “nerds” struck first, launching a data-heavy site called Fire Joe Morgan (which existed from 2005–2008). This was a comical, and unsuccessful attempt, to get announcer Joe Morgan — a baseball old-timer who refused to read Moneyball — off the air.

Part of the problem was that the data men (they seem to be all male) were ahead of their time. Billy Beane’s stats-heavy style, which was once so revolutionary, is now standard practice across major league baseball.

Another problem is that the data crunchers were not always right. The sabermetrics community ridiculed the notion that catchers could influence an umpire’s ball/strikes calls bythe way they caught the ball. This was something that older players had claimed for decades, but it seemed preposterous to the proud empiricists. The old players were right. New higher powered cameras, which came along in 2006, showed clearly that catchers who fooled the umpire saved their team a half-run per game.

Setbacks like this moderated the ardor of the data men. Today they are much more likely to acknowledge that not everything can be explained by the numbers, because not every aspect of a sport can be measured. Intangibles play a part, after all. Bill James “groans whenever he hears people discount leadership or team chemistry or heart because they cannot find such things in the data. He has done this himself in the past … and regrets it.” In other words:

There are more things on heaven and earth, Horatio,

Than are dreamt of in your philosophy.”

This evolution does not mean that data analysts have no role — rather, it points toward the advantages of humility as they continue to define this role. It also does not mean the traditionalists were completely right to reject the upstarts. Statistics such as Wins Above Replacement do provide meaningful and actionable data that all general managers can use.

As Levin points out, the true distinction in this controversy— rather than the concocted battle between “jocks” and “nerds” — is between the curious and the incurious. The baseball establishment has learned how to use sabermetrics to make decisions, and the sabermetricists have come to terms with the limits of their techniques. Both developments required curiosity, openness, and a willingness to re-assess fixed notions.

Which is how change actually happens. And which brings us to the advocacy for open access.

Open Access Publishing: The logical case for immediate open access remains as strong as ever — digital production and publication is cheap (unlike print, which requires shipping and physical storage), the authors and reviewers of scholarly papers are unpaid which means those costs are minimal as well, and there is a vital public interest in having immediate access to the latest scholarship. Immediate open access served the interests of scholars and of society, and is now easy to achieve.

So, in 2005 (the same year Fire Joe Morgan went online) I found myself joining many other open access advocates in proclaiming that the end of traditional publishers was nigh. Publishers were antique intermediaries, print-era fossils carried forward into our brave new online world. It was only a matter of time before all authors made their papers seamlessly available, either through their websites or their university’s digital repositories. Just as data was going to transform baseball, the web was going to transform publishing.

This is not at all what has happened. The most powerful publishers have only further consolidated their position since 2005, and can now count on both subscription/license fees and direct author/funder payments. Journals with high impact factors command deep loyalty among scholars, who are completely insulated from the cost of obtaining those journals but do need to impress their tenure committees. Scholars want to be read widely, sure…but even more importantly they want to earn prestige within their tight disciplinary communities. And those communities, generally, value established titles more than openness.

None of the above means that open access advocates were, or are, wrong in their core critique. It does point to a naivete about the human and cultural factors underlying academic publishing, which is steeped in tradition.

Another weakness of the open access critique is that it assumes that publishers have built no useful digital infrastructure, when in fact the CrossRef system that links articles together is highly valuable. This kind of utility may not be sexy, but it is essential to online research and is the kind of thing that scholars would never have built themselves. Researchers are good at many things, but building the support posts for the scholarly conversation is not one of them.

Just like the data analysts were overconfident that sabermetrics would transform baseball overnight, open access advocates (me included) were too confident that publishing would be overturned in the blink of an eye.

But It’s Not a Good Analogy: “Yeah, yeah, yeah, Marcus. This is a bad analogy. In baseball everyone wants to win — they just had different ideas about what made this possible. But at least everyone had the same goal. Open access advocates wanted to end traditional publishing, of course publishers did not feel the same way. The resistance in this case was of a different magnitude that you cannot compare.”

True. Sort of. But let’s recall Josh Levin’s essential distinction between “curious” vs. “incurious.” The incurious open access advocate is still banging the same drum in 2016, convinced as ever of the righteousness of the cause. This is the same strategy that has failed for 11 years, or more, to dislodge traditional publishers.

On the other hand the curious open access advocate opts to fully understand the ecosytem within which scholars operate, and to work with publishers to evolve business models toward full openness. The open access dialogue is a long-term conversation that will be filled with fits, starts and dead end. In that way it is indeed different than the more straightforward baseball story. But curiosity, openness, and a willingness to re-assess fixed notions will carry the day here as well.