Predicting Game Winners: Challenges and RunPlusMinus Results
If You’re So Smart, Why Ain’t You Rich?
The RunPlusMinus™ statistic claims to be the best measure of on-field player and team performance. See the article “The best baseball statistic” for reasons why.
If this is true, it should be useful in predicting the winner of future games based on teams’ previous values of the RPM statistic. This assertion is based on the fact that the team that wins a game always has a higher RPM value than its opponent. This article describes challenges of predicting MLB game winners and the nature, method and quality of game-winner predictions using RPM values. Games in the 2018 MLB season up to the All Start break were used to verify the claim that RPM game-winner predictions are accurate. Specifically, 56.7% of predicted game winners were correct.
“How Did You Shoot?”
Tom Watson — the famous golfer — was in the players’ lounge at the end of a round when a newbie on the tour walked in. Tom asked him “How did you shoot?”. The rookie replied “On the first hole, I just missed the green but got down in two; I birdied the second hole but had a double on the third because of a bad lie”. He continued describing his round in great detail. At some point Tom interrupted him “I simply asked ‘How did you shoot?’”. The message here is that forecasting the winner of a baseball game does not require an explanation of why — only the prediction.
Characteristics of a Prediction
A prediction has two important characteristics — accuracy and precision.
A prediction is accurate if the prediction turns out to be true. It is easy to make predictions that are 100% accurate. For example, a weather forecast that states “This year there will be some rain, some sun, some hot days and some cold days” is 100% true. Likewise, a baseball forecast that “Some teams will win and some will lose” will also be accurate.
Precision is a measure of the preciseness of a prediction. The prediction “The Yankees will win their fair share of games” lacks precision because it does not define the meaning of “fair share”. The prediction “The Dodgers will beat the Red Sox in 6 games in the 2018 World Series” is more precise. And the prediction “The Braves will have a walk-off win in next Saturday’s game in the bottom of the 9th with the bases loaded and a wild pitch” is very precise. The more precise a prediction is, the less likely it is to be accurate because it limits the possible events that would make the prediction true.
Prediction Quality
If the “what” being predicted can only occur once, the accuracy of the prediction can only be determined by hindsight and the prediction is either right or wrong. However, if one is making a statement that has multiple elements such as predicting the 10 players that will have the highest batting averages, one can be partially correct. Likewise, if the forecast involves a set of events, a prediction may be right about some of the events and wrong about others. In this case, the accuracy of the prediction can be expressed as a percentage.
We have all heard about the accuracy of political polls expressed as “this poll is accurate within 4% 19 times out of 20”. Most people would have trouble explaining what that means. It is simply a way of quantifying the uncertainty of the prediction. More simply, predictions about the outcome of a baseball game can be expressed using probabilities as in “there is 60% chance the Cards will beat the Reds the next time they play each other”. Probabilities are often expressed as odds but must state whether the odds refer to winning or losing. In horse racing, the odds refer to losing. A horse that is 2-to-1 means that the horse is expected to lose 2 out of 3 times if — hypothetically — the race was repeated 3 times. if the horse wins, the profit is $2 for a $1 bet or equivalently, the probability of losing is 2/(2+1) or 66%. The RPM predictions of game winners which we will publish weekly predict the winning team both as a probability and with equivalent odds of winning.
One final comment about the quality of predictions that involve probabilities. If a prediction includes the phrase “40% of the time”, then, provided the event occurs multiple times the prediction would be accurate if the predicted event did occur 40% of the time and inaccurate if the occurrence was greater or lesser than 40%.
The RPM-based predictions that are described in the following paragraphs state the probability of a team being a winner in a particular game. A chart presented later shows the quality of the RPM predictions.
Predicting the Winner of Baseball Games — The Challenges
There are many challenges facing anyone who wants to predict the winner of a baseball game with any degree of accuracy. Foremost is the fact that the top teams typically only win two-thirds of their games. On any given day, any team can beat any other team. Second, factors such as being at home, player injuries, the venue, weather conditions, the starting pitcher, and the lineup can affect the outcome. Incorporating some or all of these factors into account in a quantitative way is a herculean task. Most forecasters use some sort of points system to arrive at their predictions. As explained below, RPM predictions of game winners rely on three factors: past performance, which team is at home, and probable starting pitchers expressed as the percentage of correct predictions on a given date.
RPM Predictions
RunPlusMinus predictions of game winners are based on values of the RPM statistic. An RPM value is calculated for every player’s involvement in every play in every game. The value is positive if the player performs above average in that situation and is negative if the player performs below average. RPM values are additive meaning that a player’s RPM values can be added to give a total plus-or-minus for each player in each game. A team’s RPM values in a game are simply the sum of the players’ RPM values in that game. The team that wins a game will have a positive RPM total and the losing team will have an equal and opposite team total since the offense and defense RPM values in every play sum to zero. League standings are highly correlated with team RPM totals.
Team History. How did a team perform immediately preceding the game in question? How many preceding games should be used? Our analysis shows that the most accurate results are achieved by using each team’s RPM values in the four weeks immediately preceding the game being predicted. For example, preceding the Cubs-Mets game on June 1, 2018, the Cubs average RPM totals for the 7, 14 and 28-day periods were 2.92, 1.59 and 2.04 RPMs respectively. The corresponding averages for the Mets were -2.6, -.580 and -0.80 RPMs. The history component used in the prediction calculations was the 28- day difference, namely (2.04 — (-.80)) or 2.84 RPMs. When combined with the starting pitcher info and the Mets being the home team the prediction was that the probability the Cubs would win was 71.8%. The cubs did win 7 to 3.
Starting Pitchers. Starting pitchers’ performances definitely influence which team wins. Pitchers batting RPMs were not included to make comparisons between the AL and NL more equitable. We use the average of the pitcher’s RPMs in the 3 games preceding his next start. The average RPM value for all starting pitchers in 2018 was -0.4 RPMs. The negative value results from the fact that a pitcher typically doesn’t have equal offense and defense involvement in plays. Using an overall league average value of -0.4 RPMs, a good starting pitcher’s average of -0.1 RPMs means his contribution to his team’s average RPMs would be ( -0.1 -(-0.4)) or +0.3 thus increasing his team’s probability of winning.
Home Field Advantage. Statistically, the average home team’s RPM value is greater than the average RPM value of the visiting team. The difference in these values is the third component used when forecasting RPM values of opposing teams.
Adding the three values — team history, starting pitcher and home team advantage — for each team results in a pair of predicted RPM values for each team. The team with the higher value is the favorite to win the game. This difference is converted to a probability of winning as described in the following section.
Park effect has not been used to predict winners. The reason is that any game played in a specific park provides equal opportunities to the opposing teams to take advantage of a park’s bias toward or against batting and/or pitching performances. Team RPM values combine offense and defense performance in a single number. In a park that yields a lot of home runs, the batting RPMs of each team may increase but the defense/pitching RPMs of each team will decrease since offense RPMs plus defense RPMs is always zero. However, the batting component of individual player RPMs may be affected by the venue. For the interested reader the article Park Effect Impact describes Park Effect and how it affects player and team performances in the 2018 season.
Converting RPM Values to Probabilities
History provides the answer. For every completed game we know the RPM values of each team. We also know the outcome of the game. Sometimes the team with the higher RPM average preceding the game is the winner and sometimes it is the loser. This means that we can calculate the percentage of games that a given difference in RPMs resulted in a win. Eureka — we have probabilities!!! The chart below shows a subset of the correspondences between RPM differences and winning percentages.
The values in the first row and 3 leftmost columns have the following interpretation. If the difference in opposing teams’ RPMs is between .1 and .3 (the value in row 2) that is calculated from the teams’ RPMs in the preceding 28 days and the probable starting pitchers and which team is at home, then: 1) this difference occurs in 1.5% of games and, 2) in those games the team with the larger RPMs won 50.8% of the games. A second example: from the 3rd group of columns we see that a difference of between 7.0 and 7.5 net RPMs has a win probability of nearly 94%.
As you would expect, the distribution of RPM differences is fairly uniform and that large RPM margins result in a high probability of winning.
How Accurate are RPM Predictions?
MLB games in the 2018 season preceding the All Star break were used to determine the accuracy of RPM-based predictions. This set of games as well as predictions for more than 2400 regular season games in 2017 provided an excellent set of data to test the accuracy of RPM-based predictions.
For each game, we calculated a predicted probability of each team winning the game. Ignoring for a moment these probabilities, the predicted winner matched the actual winner 56.7% of the time. We believe this fact strongly supports the claim that RPM predictions of game winner are accurate because they are correct more than 50% of the time.
The chart below shows the results when the games in the preceding 7, 14, and 28 days of play are used in calculations. As can be seen, a 4-week history period gives the highest prediction accuracy. Specifically, the 57.1% value in the bottom right corner means that when winning probabilities between 48% and 52% are ignored, the RPM-based prediction was correct for 57.1% of games. If RPM differences close to 0 are included the accuracy decreases to 56.5%. Our analysis has shown that using more than a 28-day history does not increase prediction accuracy. This makes sense because the effect of roster changes and “streakiness” decreases over time.
For a given game, what is the meaning of the percent chance of winning? There are two parts to the answer. First, one would expect that a prediction “Team A will beat Team B with a probability of 60%, say, is “stronger” than “Team A will beat Team B with a probability of 52%”. This simply means if you compared the results of 100 games in which the probability was 60% with 100 games in which the prediction probability was 52%, then the “60% games” will have had more correct predictions than the “52%” games. In other words, the higher the prediction probability, the expected higher percent of correct winners. The chart below, shows this for 2018 predictions of regular season games up to July 17th. On each game day, predictions were made for each of the (1 to 15) games. We calculated what percent of the games on that date were predicted correctly. Then we calculated how many days we had a given accuracy. For example, the chart data shows that our accuracy was between 60% and 69% on 37 days. It also shows that our accuracy was less than 10% on 1 day and greater than 80% on 5 days.
Overall, 56.5% of the predictions were correct.
Wagering on Game Winners
The second part of the answer related to the percent probability of winning has to do with wagering on game outcomes. Clearly the payoff for equal-sized wagers should depend on the probability of the outcome. Higher probabilities or equivalently, better winning odds, should have lower payoffs than ones with lower probabilities of winning. For example, winning a bet on a team that has a 40% chance of winning (odds of winning 2-to-3) will provide a $3 profit for each $2 bet. If the probability of winning is 70%, the odds of winning are 7-to-3 and has a potential profit of $3 for each $7 wagered. An overall RPM accuracy of 56% means that if the favorite team wins, a person who bet $1 on that team will make a profit of 44/56 or 78 cents per dollar bet. On the other hand, a winning bet on the underdog will have a profit of $1.27 per dollar bet. The foregoing assumes there is no administrative fee assessed for making the wager.
Conclusions
- There are two important attributes of prediction statements — accuracy and precision. Predictions of game winners can be expressed as probabilities or as odds of winning.
- The inherent uncertainty of forecasting (Yogi Berra: “The Future Ain’t What It Used To Be”) coupled with the variability of MLB team performances means that predicting the winner of MLB games with a high degree of accuracy is very difficult.
- The accuracy of RPM-based predictions of game winners is based on recent team performances, probable starting pitchers and the home field advantage. Results provide proof of both the prediction methodology and more important, the mathematical assumptions and model that underlie the values of the RPM statistic. In spite of the overall accuracy of 56%, there are days in which every prediction will be wrong, and days when every prediction will be correct. We are confident that our overall accuracy will remain significantly above 50% and will provide statistics that justify this claim.
Until next time…
Stay tuned for our future article and reports due out every week this season. If you want to be reminded whenever we release new content, please subscribe to our mailing list to be kept up to date!
If you have any questions, comments, requests or complaints, please feel free to add them in the comments below or to email us at info@runplusminus.com
You can learn more about the RunPlusMinus™️ statistic at RunPlusMinus.com