Empirical Techniques in Economics: Much Ado About Nothing

It’s pretty common to see people talk about the ‘empirical turn’ in economics: new datasets and the methods used to analyse them have become so advanced that economics is becoming more science-like as a result. While much has been written criticising the methods themselves, not enough has been written about how even in the cases where they seem to work, they typically add little to our understanding. Researchers have wasted a surprising amount of intellectual effort by focusing on ways to utilise these techniques in such situations. What’s more, the fact that there are so many documented issues with them means that in some cases these methods could be obscuring more transparent empirical observations, actively leading us to the wrong conclusions.

I was first exposed to a version of this argument when reading a paper by Clint Ballinger which discusses this point in the context of development and cross-country regressions. Ballinger uses the example of a researcher who wants to estimate the effect of the ‘Ghent system’, where unions administer unemployment insurance, on union membership. He contrasts the following regression estimate from 17 countries:

Union Density= 0.47*(Ghent)+0.28*(Left Government)-0.34*(Log of Potential Membership)

…with this chart:

It is not clear what the regression adds in this context: both the table and the regression clearly show that the Ghent system is associated with higher union membership, though neither can give us a causal interpretation. The precision conferred by the point estimates is meaningless for such a small sample and across such disparate ‘data points’ as different countries.

In fact, Ballinger points out that the table is actually a better way of looking at things. Firstly, unlike regression it does not impose any arbitrary assumptions about the data. Secondly, it is far more transparent, intelligible to virtually anyone with an elementary grasp of statistics (if you think this is true of regression analysis then you need to teach econometrics to undergraduates). Thirdly, it reveals far more information than the regression, suggesting targets for future research that are obscured by the regression analysis: why is membership so comparatively low in the Netherlands and so high in Ireland? Why is Belgium a little lower than the Scandinavian countries?

Criticisms of cross-country regressions are not new, but Ballinger’s argument suggests a simple path forward, and this extends readily to modern empirical techniques used in microeconometrics. A prime example is Regression Discontinuity Designs, or RDDs. These exploit a threshold which creates a discontinuity based on an otherwise continuous variable. For example, children with their grade above a certain number might be put into a higher tier class — even though the grades are continuous, the cut off is not. By looking at individuals just above and just below the cut-off — who have been put into different classes but are practically the same in terms of ability — RDD can attribute all of the difference in outcomes (such as subsequent grades, or earnings) to the ‘treatment’ of being in the better class.

While this reasoning is fine as it goes, I would argue that in most cases any statistically detectable effect will be visible from a simple graph. Consider a paper published in one of the top ranked economics journals which looks at voter enfranchisement in Brazil. Filling out papers to vote resulted in a large number of invalid votes by illiterate voters, so machines were installed which used pictures instead. These were rolled out gradually, starting with municipalities with populations of more than 40,500 in 1998 — giving us our discontinuity — before being rolled out to all municipalities in 2002. The paper looks first at the effect this had on voting levels, and then goes on to see if it impacted subsequent health outcomes.

Is this an interesting, socially relevant question? Absolutely! However, consider the following figure from the paper:

It’s pretty clear from the blue line that the voting machines had a substantial impact on vote validity, and it’s not clear what the RDD adds except being a robustness check. It could be claimed the RDD is somehow more statistically ‘rigorous’, but both the descriptives and the RDD ultimately have to make the unverifiable (albeit sensible in this case) assumption that nothing else relevant suddenly changed in Brazil at that time.

The over-focus on applying a fancy method means that a lot of the paper is devoted to the methodology, rather than to further exploring simple descriptives (a la the table above), which would potentially reveal some interesting irregularities — for example, were the machines unexpectedly more or less effective in some districts, and why? Equivalent descriptives are not made available for the health outcomes in the paper, but I would hazard a guess you could glean much from these alone, too.

Another example is a recent job market paper on how statutory retirement ages act as a ‘reference point’, inducing people to retire even though there is not necessarily a financial incentive or legal obligation to do so. Once more, the paper uses a relatively advanced technique (‘bunching’, which estimates whether the sudden jumps are statistically expected) on German data — where there is a lot of variation in pension programs — to determine whether this is what is actually happening. But once more, there is no need to:

The huge spike in retirement at the three statutory ages is visible to anybody with eyes. Now, it may be argued that there is other stuff going on at the time, but again this is true whether bunching analysis is being used or not — the paper has to provide contextual evidence and compare instances with other financial incentives to retire to instances without them, which can also be done using graphs such as Figure 4:

The left two graphs show statutory ages, while the right-hand ones show pure financial incentives — it is clear that the former effects are far larger.

One could defend the more precise results obtained via bunching because the paper goes on to use them to simulate the effects of increasing the retirement age to 66. But I would tend to distrust this exercise: existing retirement ages, especially 60 and 65, are long established norms, and changing them won’t necessarily replicate the current ‘reference point effect’ at the new age (hello Bob Lucas). Thus, the precision offered by the fancy methods is not much to get excited about.

It is also worth pointing out that even without the graphs the findings of these papers are largely predictable and intuitive. Yes, increased union benefits are associated with higher union membership. Yes, making voting much easier for people increases the amount they vote (the effect on health outcomes is admittedly less obvious to a casual observer, but as the paper notes there were good contextual reasons to expect them — that’s actually why he chose health in the first place). Yes, retirement ages serve as a social norm even for people for whom they are not legally binding and where there are no financial incentives. Writing long papers for top journals using these methods is a distraction from engaging with more pressing issues (or with more relevant aspects of the same question).

So maybe we don’t gain much from these techniques. But are they actively problematic? Well, they could be. Though this probably isn’t the case in the above papers, sometimes the use of statistical methods can smuggle in assumptions and hide inconvenient facts, producing results that are at odds with more transparent descriptive observation. As two recent papers illustrate — one by Alwyn Young on Instrumental Variables and one by John Ionnidis on empirical economics in general — these methods usually throw up as many problems as they purport to solve. (I’m not going to discuss the critiques of various methods themselves in detail here, but check out the Angus Deaton on RCTs, I also had a go; the famous Edward Leamer paper, then again on the ‘credibility revolution’; and Swann and Achen on regression.)

A recent example which may have led to false conclusions was this study of the recent increase in the Seattle minimum wage, which found a resultant decline in total employee compensation. The authors use Difference-in-Differences (DiD), which compares Seattle to surrounding areas which did not have the minimum wage increase. They acknowledge there are issues with basic DiD — in particular, assuming the different areas would have changed the same way in absence of the policy change — and so go on to use more complex methods, finding an overall decrease in payroll due to a fall in hours worked (with no effect on employment).

But as Tom Walker at Econospeak pointed out, simple descriptives in Table 3 of the paper show there was actually an annual increase of 18% in compensation of employees in the couple of years after the minimum wage rose, a substantial rise. Now, I understand the issue of confounding factors and I’m not saying I’m sure who is right here. However, I would conjecture that given the issues with statistical methods, as well as the opacity of econometric research, there is no a priori reason to expect the more sophisticated analysis to be producing the right conclusion. I am more inclined to believe, based on easily observable facts, that the minimum wage did not cause any obvious issues with compensation/employment in Seattle — I might add that this could have something to do with the fast growth in Seattle over the period, and that minimum wages might be more problematic in slower growing areas (a possibility the chosen methods, as far as I’m aware, are unable to account for).

I’m not saying there are no cases where econometric methods work — random utility models spring to mind as a successful example — but what I am claiming is that a myopic obsession with these methods could be doing more harm than good. Delving further into the descriptives and the qualitative context can often tell us more about what’s actually going on in the world — the techniques can serve as a robustness check, but no more. Giving up the pretension that these advanced methods always make economics more rigorous could make papers more accessible to other disciplines and to the public, while freeing up researchers to ask more interesting questions, even if they don’t permit the use of the latest techniques.