Predicting Game Winners: Challenges and RunPlusMinus Results

If You’re So Smart, Why Ain’t You Rich?

J.B.Moore, Ph.D
Aug 2, 2018 · 11 min read

The RunPlusMinus™ statistic claims to be the best measure of on-field player and team performance. See the article “The best baseball statistic” for reasons why.

If this is true, it should be useful in predicting the winner of future games based on teams’ previous values of the RPM statistic. This assertion is based on the fact that the team that wins a game always has a higher RPM value than its opponent. This article describes challenges of predicting MLB game winners and the nature, method and quality of game-winner predictions using RPM values. Games in the 2018 MLB season up to the All Start break were used to verify the claim that RPM game-winner predictions are accurate. Specifically, 56.7% of predicted game winners were correct.

“How Did You Shoot?”

Characteristics of a Prediction

A prediction is accurate if the prediction turns out to be true. It is easy to make predictions that are 100% accurate. For example, a weather forecast that states “This year there will be some rain, some sun, some hot days and some cold days” is 100% true. Likewise, a baseball forecast that “Some teams will win and some will lose” will also be accurate.

Precision is a measure of the preciseness of a prediction. The prediction “The Yankees will win their fair share of games” lacks precision because it does not define the meaning of “fair share”. The prediction “The Dodgers will beat the Red Sox in 6 games in the 2018 World Series” is more precise. And the prediction “The Braves will have a walk-off win in next Saturday’s game in the bottom of the 9th with the bases loaded and a wild pitch” is very precise. The more precise a prediction is, the less likely it is to be accurate because it limits the possible events that would make the prediction true.

Prediction Quality

We have all heard about the accuracy of political polls expressed as “this poll is accurate within 4% 19 times out of 20”. Most people would have trouble explaining what that means. It is simply a way of quantifying the uncertainty of the prediction. More simply, predictions about the outcome of a baseball game can be expressed using probabilities as in “there is 60% chance the Cards will beat the Reds the next time they play each other”. Probabilities are often expressed as odds but must state whether the odds refer to winning or losing. In horse racing, the odds refer to losing. A horse that is 2-to-1 means that the horse is expected to lose 2 out of 3 times if — hypothetically — the race was repeated 3 times. if the horse wins, the profit is $2 for a $1 bet or equivalently, the probability of losing is 2/(2+1) or 66%. The RPM predictions of game winners which we will publish weekly predict the winning team both as a probability and with equivalent odds of winning.

One final comment about the quality of predictions that involve probabilities. If a prediction includes the phrase “40% of the time”, then, provided the event occurs multiple times the prediction would be accurate if the predicted event did occur 40% of the time and inaccurate if the occurrence was greater or lesser than 40%.

The RPM-based predictions that are described in the following paragraphs state the probability of a team being a winner in a particular game. A chart presented later shows the quality of the RPM predictions.

Predicting the Winner of Baseball Games — The Challenges

RPM Predictions

Team History. How did a team perform immediately preceding the game in question? How many preceding games should be used? Our analysis shows that the most accurate results are achieved by using each team’s RPM values in the four weeks immediately preceding the game being predicted. For example, preceding the Cubs-Mets game on June 1, 2018, the Cubs average RPM totals for the 7, 14 and 28-day periods were 2.92, 1.59 and 2.04 RPMs respectively. The corresponding averages for the Mets were -2.6, -.580 and -0.80 RPMs. The history component used in the prediction calculations was the 28- day difference, namely (2.04 — (-.80)) or 2.84 RPMs. When combined with the starting pitcher info and the Mets being the home team the prediction was that the probability the Cubs would win was 71.8%. The cubs did win 7 to 3.

Starting Pitchers. Starting pitchers’ performances definitely influence which team wins. Pitchers batting RPMs were not included to make comparisons between the AL and NL more equitable. We use the average of the pitcher’s RPMs in the 3 games preceding his next start. The average RPM value for all starting pitchers in 2018 was -0.4 RPMs. The negative value results from the fact that a pitcher typically doesn’t have equal offense and defense involvement in plays. Using an overall league average value of -0.4 RPMs, a good starting pitcher’s average of -0.1 RPMs means his contribution to his team’s average RPMs would be ( -0.1 -(-0.4)) or +0.3 thus increasing his team’s probability of winning.

Home Field Advantage. Statistically, the average home team’s RPM value is greater than the average RPM value of the visiting team. The difference in these values is the third component used when forecasting RPM values of opposing teams.

Adding the three values — team history, starting pitcher and home team advantage — for each team results in a pair of predicted RPM values for each team. The team with the higher value is the favorite to win the game. This difference is converted to a probability of winning as described in the following section.

Park effect has not been used to predict winners. The reason is that any game played in a specific park provides equal opportunities to the opposing teams to take advantage of a park’s bias toward or against batting and/or pitching performances. Team RPM values combine offense and defense performance in a single number. In a park that yields a lot of home runs, the batting RPMs of each team may increase but the defense/pitching RPMs of each team will decrease since offense RPMs plus defense RPMs is always zero. However, the batting component of individual player RPMs may be affected by the venue. For the interested reader the article Park Effect Impact describes Park Effect and how it affects player and team performances in the 2018 season.

Converting RPM Values to Probabilities

The values in the first row and 3 leftmost columns have the following interpretation. If the difference in opposing teams’ RPMs is between .1 and .3 (the value in row 2) that is calculated from the teams’ RPMs in the preceding 28 days and the probable starting pitchers and which team is at home, then: 1) this difference occurs in 1.5% of games and, 2) in those games the team with the larger RPMs won 50.8% of the games. A second example: from the 3rd group of columns we see that a difference of between 7.0 and 7.5 net RPMs has a win probability of nearly 94%.

As you would expect, the distribution of RPM differences is fairly uniform and that large RPM margins result in a high probability of winning.

How Accurate are RPM Predictions?

For each game, we calculated a predicted probability of each team winning the game. Ignoring for a moment these probabilities, the predicted winner matched the actual winner 56.7% of the time. We believe this fact strongly supports the claim that RPM predictions of game winner are accurate because they are correct more than 50% of the time.

The chart below shows the results when the games in the preceding 7, 14, and 28 days of play are used in calculations. As can be seen, a 4-week history period gives the highest prediction accuracy. Specifically, the 57.1% value in the bottom right corner means that when winning probabilities between 48% and 52% are ignored, the RPM-based prediction was correct for 57.1% of games. If RPM differences close to 0 are included the accuracy decreases to 56.5%. Our analysis has shown that using more than a 28-day history does not increase prediction accuracy. This makes sense because the effect of roster changes and “streakiness” decreases over time.

For a given game, what is the meaning of the percent chance of winning? There are two parts to the answer. First, one would expect that a prediction “Team A will beat Team B with a probability of 60%, say, is “stronger” than “Team A will beat Team B with a probability of 52%”. This simply means if you compared the results of 100 games in which the probability was 60% with 100 games in which the prediction probability was 52%, then the “60% games” will have had more correct predictions than the “52%” games. In other words, the higher the prediction probability, the expected higher percent of correct winners. The chart below, shows this for 2018 predictions of regular season games up to July 17th. On each game day, predictions were made for each of the (1 to 15) games. We calculated what percent of the games on that date were predicted correctly. Then we calculated how many days we had a given accuracy. For example, the chart data shows that our accuracy was between 60% and 69% on 37 days. It also shows that our accuracy was less than 10% on 1 day and greater than 80% on 5 days.

Overall, 56.5% of the predictions were correct.

Wagering on Game Winners

Conclusions

  • The inherent uncertainty of forecasting (Yogi Berra: “The Future Ain’t What It Used To Be”) coupled with the variability of MLB team performances means that predicting the winner of MLB games with a high degree of accuracy is very difficult.
  • The accuracy of RPM-based predictions of game winners is based on recent team performances, probable starting pitchers and the home field advantage. Results provide proof of both the prediction methodology and more important, the mathematical assumptions and model that underlie the values of the RPM statistic. In spite of the overall accuracy of 56%, there are days in which every prediction will be wrong, and days when every prediction will be correct. We are confident that our overall accuracy will remain significantly above 50% and will provide statistics that justify this claim.

Until next time…

If you have any questions, comments, requests or complaints, please feel free to add them in the comments below or to email us at info@runplusminus.com

You can learn more about the RunPlusMinus™️ statistic at RunPlusMinus.com

RunPlusMinus

The best baseball stat

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store