Dawn of a New ERA: Predicting Starting Pitcher Performance

Gerrit Hall
5 min readMar 13, 2018

--

Part of my series on building a bot to manage my fantasy baseball team

As a fantasy baseball novice, one big question I had was the extent to which I could trust “predictions.” Fortunately, at the start of last years’ draft I saved as many sources of predictions as possible. The intent was to check the reliability of predictions and see if other factors are more useful indicators. In particular, I focused on starting pitchers with an ADP in the top 1000.

Bludgeon yourself in the head with a blunt object and it might resemble an American flag

Along the x-axis are factors you might know going into the draft, primarily pre-season projections averaged from as many sources as possible including ESPN, Yahoo!/Oath, Razzball, Steamer. Along the y-axis are the actual season results. They are bucketed into season-long counting stats (ie Games, Games Started, Quality Starts…), batted ball stats (Ground Ball %, Fly Ball %…), pitching averages (Earned Run Average…), per-game averages (Quality Starts per Game Started, Innings per Game…), and miscellany (Average Draft Position, handedness and age for pre-season, Wins Above Replacement and Fantasy Value for post-season).

Counting Stats Projections are ~50% Accurate

The diagonal allows for straightforward comparison of the accuracy of pre-season predictions and post-season results. The boxed stats along the diagonal highlight the category stats in our league (IP, W, L, K, ERA, WHIP).

The blue wave in the upper left demonstrates the counting stats are about 50% reliable. The strongest correlation was the Quality Starts predictions, which was almost 60% correlated. Wins and Losses fell close to but just under this mark.

If you include the undrafted pitchers (ADP ≥ 999), these counting stats rise to nearly 80% correlation. The reason is because most of the belly-itchers do in fact play very few games. Breakouts are rare, and you can get more value hunting in the lower ADP range.

Generally speaking, my interpretation is that most counting stats are a function of time on the mound. Consider how the post-season counting stats correlate against other post-season counting stats:

Note: This blue wave is not a 2018 election map

Takeaway 2: Pitching Averages are a Mixed Bag

Pitching averages were a bit trickier. Our league uses ERA and WHIP, which were only 31% and 41% correlated with their actual value. This is not terrible, but perhaps not worth staking your season on these stats based on projections alone.

Other pitching averages also fared poorly. The FIP projections were 39% correlated with the actual results, although it ended up predicting ERA results marginally better than the ERA projections. BABIP projections were just 20% accurate and were essentially useless at predicting anything except, curiously, ground ball rates.

On the bright side, SIERA (Skill-Interactive ERA) projections looked quite good. The SIERA projections held up not just in terms of predicting the actual outcome of the SIERA (50% correlation), but also proved a far better predictor of ERA than even the ERA projections (SIERA projections were 39% correlated with actual ERA, versus 31% correlation with ERA projections).

Not all average stats ended up so poorly correlated. The ground-ball and fly-ball rates were the most accurate predictions across the board, coming in at 69% and 72% respectively. Unfortunately, these stats proved essentially useless at predicting other stats.

Strikeouts per nine also proved 58% accurate, and also showed some correlation with useful stats like total strikeouts (23%), ERA (29%), and WHIP (28%). Not terribly strong indicators, but aspiring stats geeks should give it stronger consideration.

Takeaway 3: Age Is Something But a Number

I had expected the numbers to favor younger pitchers, but veteran status is actually a slight benefit. The pitching averages showed little correlation with age, but there was in fact a slight correlation (20–30%) between age and the counting stats. The explanation is primarily that older pitchers face more batters:

Mr. Colon, well north of 40, would probably tip the scales on an age vs weight scatterplot as well

This isn’t terribly actionable, although somebody could do an interesting follow-up analyses on a few trends. To the naked eye it looks like the progression is that <24 year olds get tested, by 26 they get locked in, and those who survive into their 30s are generally reliable workhorses.

Takeaway 4: ADP Proves the Wisdom of the Crowds

Almost across the board, ADP is the best predictor of nearly every major stat. Draft position was the single best predictor of key fantasy stats including WHIP (54%), ERA (55%), Wins (57%), Quality Starts (60%), Total Strikeouts (66%). Looking at post-season indicators, the draft position showed a whopping 81% correlation with Wins Over Replacement and a 75% correlation with Fantasy $ Value.

In other words, if you take nothing away from this article, it’s that you will likely be better served by simply autodrafting your starting pitchers than by any other strategy.

Bonus: Modeling ERA

Given the poor performance of ERA calculations, I did a little regression work to see if I could come up with a better formula for modeling ERA. After a few scripts and some trial and error, here’s the best formula I could find:

The results are pretty simple to interpret. The Innings per Game is a corrector for SPARPS (Starting Pitchers as Relief Pitchers), like the Brad Peacocks of the world finished the season with a 3.0 ERA on an average of 3.8 IP / G. It wasn’t feasible to try to sort out relief innings from starting innings for this exercise, but it wasn’t terribly necessary given the low draft position of such pitchers.

Otherwise, the rule of thumb is a starting pitcher who chews up 6 innings will have a baseline ERA of 3.3. Then roughly add 0.3 for every hundred draft positions — so for a pitcher drafted around the hundred slot you’d expect an ERA of 3.6, whereas a pitcher drafted 300 should generate closer to 4.2

Checking this quick and dirty equation against 22 holdout pitchers looks good. The sum of the square of the residuals for our formula landed a tidy 14.37, whereas the residuals for the experts’ predictions were an ungainly 22.38.

The experts nailed Mr. Kershaw (bottom left), but never gave much higher than a 4.0 ERA to pitchers who would skew much higher.

--

--