Post referendum, whither the UK economy?
Pun intended. The FT offered an overview on 17 December of different estimations of the economic impact of Brexit. That is, comparing the actual path that the UK economy has followed since the Brexit referendum with the path that the UK economy would have followed if there hadn’t been a Leave vote. The difference between these two paths can then be attributed to the Leave vote and impending Brexit and labelled as the cost or benefit of Brexit so far.
But how can anyone tell what path the UK economy would have followed in different circumstances? Obviously one easy way is simply to project forward on the back of a previous trend. But that begs the question of what the appropriate trend is. The FT article started with its own calculation based on the assumption that if the referendum had gone the other way, then GDP growth would have been at its very long term 1955–2016 average.
I’ve got a number of issues with this illustration. Firstly, although it is expressed in real terms (i.e. inflation-adjusted), the terms are £ rather percentage growth. A straight-line increase in £ in fact represents a diminishing growth rate as each repeated increase forms a smaller part of an increasing denominator. So what this chart rather obscures is that GDP growth had been decreasing before the referendum. Secondly, the average growth rate from 1955 to 2016 (which includes multiple recessions) was more than double the average of the period 2006–2016 and higher even than the period from 2010 to 2016 which obviously included no recession at all.
If we do look at the rate of GDP growth post recession we can see that a counterfactual in which growth returned to its 1955–2016 average means assuming that a Remain vote in the referendum would have led not just to a jump in GDP growth, but a sudden reversal of its most recent trend. This seems a bit hard to explain when a Remain vote would have meant retaining a status quo in line with expectations (and with no post-referendum emergency interest rate cut). It seems even harder to justify saying that this is likely to have happened *based on historical growth patterns* because even if a long-term average has some sort of mean-reverting predictive value for the future, it only predicts a sudden move at any particular point in time for lovers of hockey-stick charts. In fact, rather than further decline, GDP growth has been broadly stable since the referendum at around the 2010–2016 post-recession average. So extrapolation of previous trend, particularly long-term previous trend, doesn’t necessarily provide a very good counterfactual, especially for short-term change (cos needing hockey stick).
A much better counterfactual than extrapolation of previous trend would be something that had matched the evolution of the thing I was interested in. For example if GDP growth in the UK had exactly matched GDP growth in the Eurozone up to the point of the referendum, then any subsequent divergence could be put down to the effect of the referendum. This is theoretically much better because I am comparing the UK with other real observations in the present, and if for example there were a global slowdown, that should affect the counterfactual in a similar way to the UK, whereas it would have no effect whatsoever on an extrapolation of previous trend.
But finding a useful ‘counterfactual’ means finding something that is both the same at the point of departure from the status quo and that got to be the same at the point of departure by the same route. Let’s have a homely analogy …
Say I’m running 20 laps of a running track with a number of other people and wondering whether an energy drink halfway through will boost my performance relative to theirs. Over the most recent lap three others have been running exactly alongside me. So using this as my benchmark lap I gulp down my energy drink but at the same time a gusty wind gets up and over the next lap one of the others keeps pace and two fall behind. Does this mean that I’ve done better than if I hadn’t had the drink? It’s hard to say. My lap time was actually worse, but that can be explained by the gusty wind. As all of us would have been equally hit by the wind, then although my absolute performance was worse, the impact of the drink can perhaps be assessed by comparing my relative performance to the people who had been running at the same pace as me during the previous lap, and on that basis it looks like a success. However, if I look at earlier laps, I notice that the two people who fell behind were rather slower runners than me who on my benchmark lap had both put in a much faster time than they usually do at a pace they were never going to be able to sustain. So I can’t really conclude much on this evidence. What I really need as my ‘counterfactual’ is someone who had been putting in the same times as me lap by lap from the start, so that I could see whether our performance diverged after I had had my drink (assuming of course that he or she didn’t have one). But no-one’s times exactly matched mine lap for lap up to the point I had my drink. However, setting my computer to work, I can see that there are two other runners whose lap by lap times when weighted and averaged do match mine lap by lap up to that point. One of them was running 5 to 10 seconds a lap faster than me and the other 2 to 3 seconds slower. If I average their times for each lap giving the second runner twice the weighting of the first (because her times are closer to mine) in calculating the average, then the resulting lap by lap times more or less exactly match mine. If I apply the same weighting and averaging to their subsequent lap times, this is my synthetic counterfactual and the assumption is that my own subsequent lap times would have been the same as this imaginary runner because they had been the same up to that point. So I estimate the effect of my energy drink by comparing my actual lap times after the drink with the lap times of the synthetic me.
Now in just the same way that I wasn’t able to find an runner whose performance had exactly matched mine pre-drink, so it is impossible to find an economy that exactly matched the UK economy in the quarters or years pre-referendum. But a synthetic counterfactual to the UK economy up to the point of the referendum can be similarly constructed by weighting and averaging the performance of a selection of other economies just as I weighted and averaged the performance of other runners to match my own performance up to the point of taking my energy drink. And I can then compare the UK’s subsequent economic performance to the performance of this weighted basket of other economies to estimate the effect of the referendum vote. This weighted basket is the ‘synthetic UK’.
The FT article refers to just such a comparison, carried out by a team of economists at different reputable universities who I’ll refer to as Born et al. Their write-up of their initial findings got a fair bit of headline coverage about ‘the true cost of Brexit’ and it’s certainly worth a close look at what they did because, as above, this is theoretically a powerful tool for estimating what that cost might be as time goes on. Here’s their chart showing the real UK economy diverging from their synthetic UK (here called its doppelganger) and falling behind it.
But, as I always ask, how meaningful is this? 83% of the weighting in the basket that makes up this synthetic UK is accounted for by four countries with 15% for Canada, 24% for Hungary, 25% for Japan and 19% for USA. Just for ‘thought experiment’ purposes, looking at % nominal GDP growth since 1995 in $ from the same OECD data used by its manufacturers, while USA tracks the UK quite closely over the whole period until the UK suffers more of a recessionary drop, and so does Canada until 2014, Japan moves ever further away to the downside from the get-go and Hungary diverges strongly to the upside after 2000.
Japan’s economy was thirty times the size of Hungary’s at the outset, and is still around twenty times its size. Thus these two very differently-sized economies, moving increasingly away from the UK in opposite directions, can be made to bring the counterfactual closer to the UK only by giving them high and essentially identical weights in the counterfactual. And it seems that the only reason they are given these weights is because that’s what makes the UK and counterfactual lines fit. It isn’t claimed that anything in the synthetic or its construction has any explanatory power — indeed the need to give equal weighting to giant Japan and tiny Hungary seems necessarily to imply that it doesn’t. Now one might say that that’s how synthetic counterfactuals work, but I’m not sure that this is so.
The paper references previous academic research by Abadie et al as the progenitors of the synthetic counterfactual, who constructed a synthetic West Germany so that they could estimate the economic effect of German re-unification. Abedie at al made clear that they thought constructing a synthetic counterfactual wasn’t simply a question of using an algorithm to weight and average different selections of countries mechanistically until a fit is obtained between the real and the synthetic. Indeed, they said
Constructing a donor pool of comparison units requires some care. First, units affected by the event or intervention of interest or by events of a similar nature should be excluded from the donor pool. In addition, units that may have suffered large idiosyncratic shocks to the outcome of interest during the study period should also be excluded if such shocks would have not affected the treated unit in the absence of the treatment. Finally, to avoid interpolation biases, it is important to restrict the donor pool to units with characteristics similar to the treated unit. Another reason to restrict the size of the donor pool and consider only units similar to the treated unit is to avoid overfitting. Overfitting arises when the characteristics of the unit affected by the intervention or event of interest are artificially matched by combining idiosyncratic variations in a large sample of unaffected units. The risk of overfitting motivates our adoption of the cross-validation techniques applied in the empirical section below.
In constructing their synthetic Germany, Abadie et al gave weights of significance to five countries: Austria 42%, USA 22%, Japan 16%, Switzerland 11% and Netherlands 9% (though different versions of their paper use slightly different numbers). Intuitively Austria, Switzerland and Netherlands (that account for 62% of the weighting) are very likely to be ‘like’ each other and like Germany too as all border on Germany and their economies have been highly connected for a long time (Germany was Austria’s main trading partner even before Austria joined the EU). And empirically, the cross-validation looking at six key measures, found…
The synthetic West Germany is very similar to the actual West Germany in terms of pre-1990 per capita GDP, trade openness, schooling, investment rate, and industry share. Compared to the average of the OECD countries, the synthetic West Germany also matches West Germany much closer on the inflation rate.
So having identified a weighted mix of countries that appeared to make a synthetic that fitted the real, they checked that a range of different economic characteristics all matched up too. In my running example, the equivalent would be seeing how the components of the synthetic me matched the real me for age, gender, experience, size etc.
Unfortunately the Born paper does not seem to report the results of any such cross-validation. But using Hungary and Japan as the largest-weighted countries (as above, outliers of some magnitude in opposite directions) certainly looks to me like comparison units that are “artificially matched by combining idiosyncratic variations” as the differences between Hungary and Japan derive from entirely different causes. In Hungary’s case, the fall of the Iron Curtain and then joining the European Union pushed its economy ever upwards from a very low base, while bubble-bursting and subsequent stagnation pushed Japan down from a high base. These are idiosyncratic shocks because they applied separately to Hungary and to Japan and moreover, neither of them applied to the UK. This begs the question as to how similar either can be to the UK as the treated unit. Thus ‘splitting the difference’ between them — which the algorithm essentially does — cannot in any way explain (let alone predict) the UK path. Indeed, the only way in which the authors identify Hungary as ‘like’ the UK in their paper is that it is outside the Euro area. But this can’t possibly be meaningful. Firstly, because for much of the period Hungary was outside the euro area because it was outside the EU and hence quite unlike the UK which was within the EU throughout. One of the authors expanded on this in the FT article by saying
like the UK, Hungary is a European economy and integrated into the production chains, but remained outside the eurozone with a floating exchange rate and therefore could use monetary policy more aggressively after the crisis
But this doesn’t seem to fit the facts, as arguably Hungary used monetary policy quite unlike the UK actually increasing rates as an immediate response to the crash , and then with later and relatively far smaller reductions. In fact the UK’s monetary policy was much more like Euro-area monetary policy.
But more importantly, even if Hungary did share this ‘non-€’ or ‘free monetary policy’ characteristic with the UK, Hungary’s economic performance over the period was not actually anything like the UK’s but, as above, its divergent performance is needed to balance out Japan’s divergence in the opposite direction, so the shared characteristic with the UK seems completely irrelevant.
One the same theme, as a further control, Abadie et al left out successively the highly-weighted countries that contributed most to the synthetic Germany. As they put it, with my emphasis …
Here we iteratively reestimate the baseline model to construct a synthetic West Germany omitting in each iteration one of the countries that received a positive weight in Table 1. By excluding countries that received a positive weight we sacrifice some goodness of fit, but this sensitivity check allows us to evaluate to what extent our results are driven by any particular control country.
So they left out one at a time Netherlands, Austria, USA etc. However, Born et al paper do not reestimate their baseline by omitting their own highly-weighted countries (i.e. Hungary, Japan etc) but different countries like France that have virtually zero weight in the synthetic UK. It is unsurprising that this makes virtually no difference to their results and so it is quite uninformative about the extent to which the results are driven by any particular one of the highly-weighted countries and so doesn’t prompt any reflection about how plausible this is. Back to my running example, if dropping one highly-weighted component of ‘synthetic me’ made a big difference, the examination this prompted might reveal that he was much older than me and so more likely to fade as the race went on, thus artificially or unfairly depressing ‘synthetic me’ and making it look as though my energy drink was more effective than it really was.
In summary, the construction of a synthetic counterfactual must be something of an art as well as a science. Having identified algorithmically a mathematical fit, thought needs to be given to why it is a fit and if sense-checks (using empirical evidence) can’t explain the fit in a fairly comprehensive manner, then there is no very good reason to expect the future performance of the synthetic to be a reasonable guide as to what would have happened to the real in the absence of an intervening event of interest, whether referendum or energy drink.
In conclusion, is a splitting of the difference between the economic performance of Japan and Hungary since the referendum, bundled up with the economic performance of the USA and Canada, likely to reflect what would have happened to the UK economy if the referendum result had been different. I can’t really see any reason to think so, and so have to take the ‘cost of Brexit so far’ calculated on this basis with a large pinch of salt.
I’ve said as much to the authors directly, and they reasonably enough pointed out that their paper is still at the ‘discussion paper’ stage and thanked me for helpfully drawing their attention to the possibility of further or better robustness checks. Unfortunately, the somewhat tentative nature of many initial academic research findings is not something that headline-writers tend to dwell on (FT honourably excepted)!