Statistical Significance of The Curse of Béla Guttmann
How many times in your professional life have you cursed your employer? How many times has it come true? How many times has it come back to haunt you? Have you ever heard of someone who cast a curse on his employer and the curse has remained intact for more than 56 years and running? For those who haven’t, here is the story of a famous football club, a legendary manager and an even more legendary curse.
Béla Guttmann was a Hungarian footballer who later took up football management. He was known for his attacking style of play and Guttmann was a part of the subculture of football thinkers who were catalysts for a very early foundation for Total Football. He is also credited with mentoring the great Eusébio. However, throughout his career, he was never far from controversy. Widely travelled, as both a player and coach, he rarely stayed at a club longer than two seasons, and was quoted as saying “the third season is fatal”. He was sacked by Milan when they were table toppers. Out of all this, there is a story about Guttmann that often surfaces whenever there is a European cup final leaving all his other achievements far from the spotlight.
Guttman took over the role of manager at the Portuguese club Porto in 58/59 season. When Guttmann took over the Portistas they were trailing O Clássico rivals Benfica by five points. Under him, they closed the deficit and claimed the title. Following his title success he made a stunning switch to Benfica, the team he had just denied the title. If moving across Portugal to Porto’s bitter rivals was not shocking enough the uncompromising Guttmann sacked no less than twenty senior players from the Benfica senior side, replacing them with members of the youth team. He also signed a player from Mozambique who was unheard of before that, Eusébio. So there it was, a team full of youth players, an unknown teenager from rural Africa. One can only speculate at what the board and fans thought of the radical Hungarian who was controlling their club’s future. Despite the pure insanity of the situation Guttmann had got everything spot on. He led his Benfica side to two league titles in 1959/60 and 1960/61 as well as a Portuguese Cup in 61/62. He also took Benfica to historic back-to-back European Cups in 61 and 62, beating Barcelona 3–2 at the Wankdorf Stadium in Bern and Real Madrid 5–3 in Amsterdam’s Olympisch Stadion. This was all when Real Madrid was dominating European football.
Following 1962 final he approached the Benfica board and asked for a pay rise. Breaking the Madrid stranglehold on Europe and making Benfica the dominant force in Portugal was almost certainly enough to warrant a pay rise or at least a generous bonus. He was rebuffed, the board pointed out that nowhere in his contract did it state that he was owed such remuneration for success. Guttmann was furious and stormed out of the club. What exactly was said we will never know, but the popular myth suggests he uttered these words.
“Not in a hundred years from now will Benfica ever be European champion.”
This happened in 1962. Now we are in 2019. In these 57 Years, Benfica reached European cup finals 8 times and they came out as runners up every single time. The most recent one being the penalty shoot out loss against Sevilla in 2014. The fans strongly believe that it is the curse of Béla Guttmann that is denying them of European glory just like Liverpool fans who believe next year will be theirs. But are they right to believe so? Let’s use some statistics to help us.
Hypothesis testing is a famous statistical technique that is widely used to figure out whether there is enough proof to conclude a phenomenon exists. For Eg consider we have a coin. Argh! This coin toss example is a bigger curse than the curse of Guttman. But we have to live with it. So we will live with it. We know that a fair coin when tossed lands on head 50% of the time and tails 50% of the times. Let’s toss the coin we have in our hand 100 times. It lands on head 53 times and tails 47 times. Here we have got heads 53% of the times. Can we conclude that the coin we have isn’t a fair coin? No. Right, because it could have been just chance that we have got more heads than tiles. Hypothesis testing helps us remove the fallacy due to chance and help us conclude a hypothesis to be true only when the chance of its occurrence is less than 5%.
The null hypothesis here is that the coin is a fair coin. To reject or accept the null hypothesis we calculate something called as a Z score. But before calculating Z score let’s understand the standard normal distribution. It is distribution having a mean of 0 and a standard deviation of 1.
From this curve, it is seen that 95% of the area falls within 2 standard deviations from the mean. The so-called “z-score” tells us how many standard deviations away from the mean our sample is. If it is over 2 or under -2 we can see that less than 5% of the area of normal distribution curve falls under this and it is not common to observe values in this range and we can reject the null hypothesis.
Z score is given by the formula Z = p-0.50/Sqrt(0.50*(1–0.50)/N)
p is our observation probability and N is the number of trails. Plugging in the values of our observation we get Z score as
0.53-0.50/Sqrt(0.50*(1–0.50)/100) = 0.6. This is less than 2 SD. The probability of getting this score in the standard normal distribution is over 54% which is much higher than our threshold 5%. So we cannot reject the null hypothesis that this is a fair coin. The case would have been different if we have got 75 heads in 100 trails. The Z score, in that case, would be
0.75–0.50/Sqrt(0.50*(1–0.50)/100) = 5 the probability of getting such a score in a standard normal distributed curve is 0.00057%. Such a low probability event using which we could have concluded that it isn’t a fair coin.
So these are the basics. This test can be used in places where two groups are compared for their conversions or behaviour. This sort of hypothesis testing is widely used in A/B testing where two groups are exposed to two different treatments and their behaviour is monitored. One is called a control group and other is called a treatment group. Consider that you have developed a website for your street football team. It is a single page website where you are requesting people to fund for a new ball as the ball you are currently playing with has outlived its purpose.
You decide to have a picture of the current state of the ball as a background in order to garner the sympathy of the visitors to make them fund for your new ball. You have two pictures you are not sure which one will garner more sympathy.
You decide to conduct A/B testing. You expose 50% of the visitors to a landing page with the ball on the left as background and 50% of the visitors to a landing page with the one on the right as background. You wanted to see which one brings in more conversions. Conversion is when a visitor decides to do donate a micropayment of 5 Rupees (Indian currency). You run the test for a week and you collect a total of Rs 230. Still 220 Rs short of the money required to buy a new football and Rs 270 short of website hosting charges. Along with it, you also get data about conversions from the two different groups. The control group which is exposed to the landing page with the background of the football on the left had 103 visitors out of which 17 micro-funded your new football. The control group which is exposed to the landing page with the background of the football on the left had 111 visitors out of which 29 micro-funded your new football.
You chose to ignore the emails you got on your registered mail id from the street footballers from the neighbouring streets asking for the ball on the left claiming it can be used for few more decades before it’s laid to rest. Now you wanted to assess which picture of your football garners more sympathy. Here the null hypothesis would be both the groups are similar in their conversions.
Conversion of Control Group = Conversion of Treatment group (or)
Conversion of Treatment Group (minus) Conversions of Control Group = 0
Here the Z score is calculated based on the difference in their conversions. The formula for Z score is given by
where N is the sample size of the experimental treatment and Nc is the sample size of the control treatment. P and Pc are conversion numbers are treatment and control respectively which is 0.261 (29/111) and 0.165 (17/103). Now you want to confirm whether this difference in conversions happened just due to chance or there is a significant difference between how people see those footballs. Let’s calculate the Z score first.
Here the Z score is 1.735. Here, unlike the coin toss example where we took into account both sides of the tail of the Standard normal distribution here, we are taking into consideration only one tail the right one to show that the conversion in the treatment group is greater than the ones in the control group. If lesser than 5% of the area towards the right side of the Z score we can safely reject the null hypothesis and conclude the ball on the right side garners more sympathy and hence more conversions. This is called the one-sided Z test.
The Z score is greater than 1.645 and the probability of getting a Z score of 1.735 is 4.1%. Which is less than the threshold 5% and we can reject the null hypothesis and conclude the ball on the right as the winner and choose to use that as the background image to 100% of users.
Now coming back to our story, let’s use the same type of test to calculate the statistical significance of the curse. Let’s gather a few numbers for that. Here is the timeline of events
Here there are two distinct Groups based on time. Before the curse and after the curse. Before the curse in 59 (n1) years of existence Benfica has been European champions twice. (p1 = 2/59). After the curse in 57 years (n2), they haven’t been European champions even once, (p2 = 0/57). Here the Null hypothesis is the curse had no effect and given their history it’s not a rarity that they haven’t been European champions even once after the curse. It’s just a chance.
But first, to prove or disprove null hypothesis we must first calculate the Z score. Plugging in the values, the Z score is calculated as
Z = abs((p1 — p2)) / sqrt( p1*(1-p1)/n1 + p2*(1-p2)/n2 )
= abs(0.033)/sqrt(0.033*(0.97)/59 + 0*(1)/57)
= 0.033/0.023
= 1.439
The probablity of observing a Z score as big is 1.439 is around 7.5% which is greater than 5%. Hence null hypothesis can’t be rejected. Therefore the curse of Béla Guttmann is not statistically significant!
It is just by chance that Benfica haven’t won a European Championship since the curse. If we do this 43 years later and even then if Benfica haven’t managed to win, then also this will be a statistically insignificant curse because the second part of this equation will always be zero no matter what.
abs(0.033)/sqrt(0.033*(0.97)/59 + 0*(1)/100)
The case would have been different if Béla Guttmann had cast this curse after winning three European titles. Just imagine a hypothetical scenario, instead of asking for a pay rise after winning the title the second time, he did continue to stay with Benfica one more term and manage to win one more European title.
The new numbers are
p1 = 0.05 (3/60)
p2 = 0 (0/56)
n1 = 60
n2 = 56
Replugging the numbers in the equation we get the Z score to be
abs(0.05)/sqrt(0.005*(0.95)/59 + 0*(1)/57)
Z score 1.77 and probablity of that Z score to be 3.7% which is lesser than the threshold.
Béla Guttmann curse would have been statistically significant if he had managed to win one more European title before deciding to ask for a pay rise. But as the “3rd Season is fatal” it did not let him do so. When Guttmann cast a curse on Benfica, statistics cast a reverse curse on him.
“Not in a hundred years from now will your curse will be statistically significant.”
However statistically insignificant the curse may be but when you ask for a pay rise next time for working hard and your boss doesn’t approve, do narrate him the story of Béla Guttmann and Benfica because