Goal Difference or Goal Average? Comparing Tie-Breakers in League Soccer

TwoThreeFive
9 min readFeb 16, 2022

--

Photo by Vienna Reyes on Unsplash

Introduction

In 1888, twelve of England’s most prominent professional football clubs agreed to arrange home-and-away fixtures amongst themselves each season. The driving purpose of this exercise seems to have been not competition but collaboration: by ensuring that each member of this self-selected elite would be able to host a minimum number of games against opposition good enough to attract the public, the clubs in question could help to secure one another financially. Nevertheless, the balanced schedule, with every team playing every other team once at home and once away, lent itself to the development of a fair ranking system. Roughly halfway through its first season, the Football League decided to award teams points for their performances — 2 for a win and 1 for a draw. In 1976, the system was amended to award a winning team 3 points.

All well and good, but how to rank teams that finished the season with the same number of points? For the first few years of the competition, this didn’t matter; but in 1894-’95, issues of promotion and relegation forced the League to confront the question. Goal average, the ratio of goals scored to goals allowed, was thereafter used as a tie-breaker until 1976, when it was replaced by goal difference. Commentary on the distinction between the two discriminators has often focused on the difference in the incentives each offers the teams subject to it. Whereas goal difference treats every goal as equally valuable, a goal scored being exactly as good as a goal not conceded; goal average gives different weights to changes in its numerator and denominator based on which is currently the largest. This means that under the goal average system, a team that scores more goals than it concedes is better off scoring one goal fewer and conceding one fewer than scoring one more and conceding one more. Although the opposite is true for a team which scores fewer than it concedes, it came to be believed that goal average in general incentivised defensive play; and the switch to goal difference was motivated largely by a desire to reverse this.

I do not know the extent to which this belief was correct. Personally, I see no definite reason why it should be expected to make much difference in the matter. Above all, a team in League competition plays to earn as many points as it can, not to satisfy a secondary criterion which may or may not be relevant. Furthermore, it is at least theoretically possible that such incentives as are offered by goal average would serve to counterbalance the general tendency for strong teams to play offensively and weak teams defensively. It would be an interesting study, but that is not the focus of this essay. The data required for such a study are easily available; but they are not, as far as I am aware, available in an Excel-friendly format. I could, if I wanted to, type all the numbers in by hand; but I’m too lazy to do that. In any case, there is another question which, in my mind at least, is at least as interesting and far more fundamental to the issue of which a league ought to prefer.

If a secondary metric is needed to separate those who have done equally well in terms of a primary metric, that primary metric being the sole separator wherever possible, the most logical question to be asked about the secondary metric is its reliability as a proxy for the primary. Points are the League’s primary metric. Teams are ranked by their success in what they are trying to do, and what they are trying to do is score points. When they cannot be separated by points, the logical thing to do is to separate them according to whatever secondary performance metric most strongly correlates with points.

From this perspective, there are theoretical and empirical reasons for suspecting that the decision to replace goal average with goal difference was an unwise one. The fact that goal average gives different marginal values to goals in attack and defence is probably a strength rather than a weakness, reflecting the very real difference that exists within a game. In terms of expected points, a leading team probably has more to gain from saving a goal than from scoring a goal, while the opposite is true for a trailing team. Different versions of Bill James’ Pythagorean Expectation formula, which is functionally equivalent to goal average in soccer, have been shown to correspond strongly with winning percentage in all four of the most popular team sports in North America.

Method

The website fbref.com has made the final league tables for every season of the Premier League since its inception in 1992 available for download in an Excel-friendly format. With every team’s win, draw and loss totals for a season included along with its points tally in that season’s table, I was able to calculate how many points it would have earned under the old 2-point system, as well as eliminating irrelevant points deductions from the data; and with goal difference broken down into its constituent parts of goals for and goals against, it was easy to compute goal averages for each team in each season. The formulae for both tie-breakers are listed below, with the initials GD indicating goal difference and GR signifying goal ratio, another term for goal average. F and A are a team’s total goals for and against respectively.

(1) GD=F-A

(2) GR=F/A

Competition points under the modern 3-point system are scored according to equation 3, where W and D refer to games won and drawn. Equation 4 gives the points a team would have earned under the old 2-point system.

(3) P3=3W+D

(4) P2=2W+D

With all these variables calculated, I was almost ready to evaluate the strength of the relationships between those calculated in equations 1 and 2 and those in equations 3 and 4. The commonest way of doing this is to calculate the Pearson Product-moment Correlation coefficient r.

(5) r=C(X, Y)/[S(X)S(Y)]

Here, the two variables are denoted as X and Y, with S(X) and S(Y)their respective standard deviations. The numerator, C(X, Y) is the covariance between them, a measure of how much they vary with one another. The covariance is calculated according to equation 6, where E is the expectations operator.

(6) C(X, Y)=E(XY)-E(X)E(Y)

Pearson’s formula, however, rests on the assumption that the relationship between X and Y is linear and unbounded, and that each is a continuous variable. In the context of this study, where X is a football team’s goal average or goal difference and Y is its points tally, the formula is of questionable applicability. Goal averages are continuous variables, but goal differences are not, and neither are points totals. There is no a priori reason to assume that the relationship between points and goals is linear over a season any more than there is to assume it is linear during a single match, and it certainly cannot be unbounded. Notwithstanding disciplinary deductions, such as the 3 points forfeited by Middlesbrough in 1996-’97 for failing to fulfil a fixture at Blackburn, no team can score negative points under the present system. Neither, in a 38-game season with a maximum of 3 points per game, can any team earn more than 114 points, irrespective of how many goals it scores or how few it concedes. Any model which can predict either of these outcomes is, to say the least, mathematically inelegant.

To overcome this difficulty, I used the Spearman rank correlation coefficient, ρ, which is the Pearson coefficient applied not to the raw variables themselves but to their ranks within the sets from which they are derived.

(7) ρ=C[R(X), R(Y)]/(S[R(X)]S[R(Y)])

For example, in the 1997-’98 season, Manchester United scored more goals and conceded fewer than any other team, giving them the highest goal difference and the highest goal average. Yet they finished second in the league table, earning one point fewer than Arsenal by both the 2-point and 3-point systems. For that season, a rank correlation of goal average or goal difference with points gives them an R(X) of 1 and an R(Y) of 2. Where teams are tied in an X or Y category, they are each given an R(X) or R(Y) equivalent to the average of their ranks. For the 2011-’12 season, in which Manchester City beat Manchester United to the championship on goal difference, both teams were awarded an R(Y) of 1.5.

This method makes no assumptions about the distributions of the X and Y variables or the relationship between them save that the relationship must be monotonic. In plain English, this means that there must be no point at which the direction of the change in Y corresponding to an increase or decrease in X changes, although the magnitude of the change might. An extra goal scored or not conceded may be worth more or less to a team in expected points depending on its current goal difference or average; but there must be no point at which extra goals, having increased a team’s expected points tally up to that point, start to decrease it. This seems like a reasonable assumption.

This does not mean that the Spearman coefficient is perfect for my purposes. Unless at least one of the variables is normally distributed, the standard z-test for significance of coefficients and for the differences between them will be inaccurate, even though the coefficients themselves are accurate enough. In this case, neither X nor Y can be normally distributed, because neither can be independent. In a closed competition, one team’s goal difference or goal average is an exact function of all the others, as every goal scored by one team is conceded by another. The same can be said for P2; and although a 3-point system makes the interrelationship between teams’ points tallies more complicated, it remains true that one team’s win is another team’s loss. Given the limited time and technology at this researcher’s disposal, permutation tests are out of the question. Nevertheless, a less rigorous comparison of correlation coefficients may still have something to teach us.

Results and Conclusion

Table 1 is a correlogram showing the Spearman rank correlation coefficients between primary and secondary performance metrics, indicated respectively by the headings of columns and subcolumns, for every completed Premier League season. Rows represent seasons, expressed as the year in which they ended. For each season, the highest correlation between a secondary performance metric and a given primary metric is highlighted. Table 2 shows a summary of the findings to be gleaned from Table 1

Table 1: Correlogram of Primary and Secondary Performance Indicators

Table 2: Summary of Key Findings

As one would expect, both secondary performance indicators show a high correlation with points irrespective of which scoring system is used. Even the lowest correlation between a secondary and a primary indicator, that which existed between goal average and P2 in 1992-’93, is greater than 0.8, meaning that even then the former variable still explained more than 64% of the variation in the latter. If the z-test were valid, this would translate to a z-statistic of 4.458286, more than enough to be statistically significant at the 1% level. Indeed, it would be significant at the 0.01% level. This was a highly unusual season in which Norwich City finished in 3rd place, high enough to qualify for continental competition, in spite of their having conceded more goals than they scored.

On average, goal average had a higher correlation with league position than goal difference under the existing 3-point system, whereas the opposite would have been true under a 2-point system. However, the difference is very small in both cases, and is unlikely to be significant in either. Under a two-point system, the difference in the average is discernible only at the 3rd decimal place. When wins are weighted at three points, one has to round ρ to four places to find it. Other findings indicate that goal difference may be slightly more reliable than goal average as an indicator of performance, although only slightly. Within the dataset, goal difference had the highest minimum value for ρ, the lowest range and the lowest interquartile range under both P3 and P2. It also had a higher ρ than goal average in 18 of 29 seasons under both scoring systems.

Based on this study, it cannot be concluded that either tie-breaker is a better or worse indicator of a team’s performance than the other. Replacing goal difference with goal average does not seem to have definitely improved the sporting fairness of the league table, but neither does it seem to have worsened it. Theoretically speaking, goal difference does at least have the advantage of being certain to exist: it is robust to the remote but real possibility that a team may go a whole season without conceding a goal, in which case its goal average would be undefined. Practically speaking, such problems could be resolved easily enough, but the solutions would be mathematically messy.

References

Pythagorean expectation — Wikipedia

Premier League Seasons | FBref.com

0.972526

--

--

TwoThreeFive

Football, Statistics and anything else that comes to mind