Gambling and Noise
I have long had a model of how a mediocre baseball team can set up rotation to maximize its chances in a short series. The logic goes somewhat contrary to the usual ideas in economic theory about attitude towards risk: basically, the worse a team is, the better it stands to do if it tries to maximize the variance — or, take as much chance as it can.
A simple example illustrates this. Suppose one team is very good and one team is very mediocre. They play one game to determine their fate. For the sake of convenience, let’s assume that the good team always gives up exactly 2 runs a game — i.e. zero variance. On the other hand, the bad team gives up, on average, 4 runs a game. However, the bad team can strategize by adjusting its variance. It is elementary to see that, if the variance is 0, the bad team always loses, 2–4. But we can imagine a scenario where the bad team either gives up 0 runs or 8 runs, with exactly equal probability. The mean is still 4, but now, the bad team gives up 0 runs exactly half the time and thus wins. The catch is that this is the best that the bad team can do — and scoring 0 and 8 with equal probability also maximizes the variance given that you cannot score negative runs.
This is hardly a new discovery: the formal work by people like Stiglitz that showed how a perfect market for credit is logically impossible (and the tradition goes back to pioneering work on information in economics by Hayek and Keynes.) is predicated on this realization, at least in part. The borrowers can always “win” by taking more risk, as long as they are not playing with their own money, and the poorer the borrower, the more they stand to win by taking even bigger risk — even though they, on average, will lose on their bet. In a sense, this also underscores the logic of “institutionalization” and “bureaucracy,” whose role is largely oriented towards reducing noise and making the process and the output predictable: the “better,” “stronger,” and the more “established” players are going to win more than not anyways, under any circumstance, but, the more orderly the rules of the game are and the less noisy the outcome, the better they stand to win consistently. If the game is to be contested unpredictably, on unexpected playing fields, something odd and unpredictable happens and, even if improbable, the weaker, but whackier team wins.
This, in an odd and roundabout way, is linked to my two earlier posts about selection effects and the misleading implication of experiments. The first point, originally raised by Gabriel Rossman, is that the conditioning introduced by the instrument is interesting beyond merely accounting for “causality.” To reuse the radio example, what the use of geological conditions interfering with radio signals as an instrument accomplishes is simply to show the dampening effect on presumed link that exists between unobserved socio-political variables and the policy-election correlation in unnatural absence of radio. The awkwardly reversed use of the language is intentional: one could just as easily say that the absence of radio causes the dampening of the policy-electoral linkage, as much as the presence of radio causes the reverse. All it means is that the effect is conditional and bidirectional, rather than “causal,” although, in many contexts, including this, the “causal” language is the more natural interpretation thereof. One thing that motivates my thinking is that I am interested in conditional variance more than I am of the conditional mean: conditional mean would tell us if there is a measurable treatment effect, while the conditional variance would tell us if it is consistent (not in the statistical sense, but in the common linguistic sense — i.e. do you always get the effect.) In the contest between the “strong” team and the “bad” team, does the presence of institutinoalization as an instrument lead to a more “consistent” (again, not in the statistical sense) set of outcomes? (and, while we are at it, what do we mean “consistent?”)
More to follow when I have my train of thought better flushed out….
We might consider the way “causality” is captured by instruments as something like the following (essentially reprising the usual diagrams the way I feel comfortable).
The population consists of multiple types that may not be observable— let’s call them T and H for convenience.
In state 1, some process X exists that affects T and H differently — say, at different probabilities (as per selection effect).
X: T → T’ at p = pt
X: H → H’ at p = ph << pt.
So the effect of X on both T and H should be T’ — H’ (or, if the p(effect) is small, p*(T’-H’) + (1-p)(T-H)) but simply examining the outcomes gives us [pt*T’ + (1-pt)*T ]— [ph*H’ + (1-ph)*H], which, for increasingly larger |pt- ph|, will be farther and farther from the T’-H’ that we seek.
An “instrument” transposes the interaction to a different setting, say state 2, where T and H are affected by a (potentially) slightly different process, X2. In principle, state 2 could be simply a subset of state 1 where X is “known” to operate differently from the state 1–2, i.e. the rest of state 1. If the instrument is “perfect,” the probability at which T and H are affected in state 2, ph2 and pt2, will be exactly the same If the instrument is merely “good,” only |pt2-ph2| <<|pt-ph| would hold. The process X2 would give us the following:
X2: T →T’’ at p = pt2
X2: H → H’’ at p = ph2.
So we wind up seeing [pt2*T’’ + (1-pt2)*T] — [ph2*H’’ + (1-ph)*H]. This will give us something close to T’’ — H’’, which will tell us that something affects both H and T differently, if pt2 and ph2 are close AND are reasonably close to 1. But this does not “really” tell us how the reaction would play out in state 1: unless we can make the assumption that X2 and X1 are, again, fairly similar, the estimated effect in state 2 cannot be applied directly to state 1. It would only tell us that the effect is not just driven by the selection effect.
Of course, this is not just a statistical problem — this is a common problem in actual lab settings, too. Something “works” (or doesn’t) in vitro, but they don’t (or do) in vivo. The difference in X1 and X2, rather than simply that there is, in a fairly small subset of cases which in practice may not be easily identifiable, evidence that there is a evidence that the difference in effect seen for experimental and control sets is not just the product of selection effect (what commonly passes as “causality” in many applications). For all practical considerations, what we really want to know is not “causality” per se, but the difference in the states in which the reactions take place.
I’m going back to my chemistry roots again: I’ve just described, in slightly different words, how catalysts work, as facilitator of (some subset of) reactions that take place rarely without their presence. While the presence of catalysts might indicate that, indeed, carbon monoxide and nitric oxide do and can react to form carbon dioxide and nitogen, using it as a tool to investigate “causality” seems a bit of misuse of the conceptual framework. All that the reaction in presence of, say, palladium, shows us is that the reaction 2CO + 2NO → 2CO2 + N2 happens, or can happen somewhere somehow (hardly evidence of “causality”). Not that it happens without the presence of palladium (maybe), in presence of nickel (not really), or in presence of platinum (yes), or more accurately, the reaction provides very little information on how much of the reaction one should expect to take place in these different settings (the more important and relevant information than the “causality” that isn’t quite). (NB: In this sense, the causality obsession has vague analogue to the p-value obsession: the assumption of “exactly zero” effect for which p-value is computed is a strawman, as Frank Harrell points out. Far more useful to calculate probability that the effect is outside some useful reference value. To continue with the catalyst analogy, it’s not so much that the catalytic converter reaction takes place at exactly 0 rate in absence of a catalyst, only that the rate is so low that it is uninteresting, while, in presence of certain materials, it is taking place at meaningful rates. A different critique of p-value obsession, in terms of contrast between “statistical” vs. “substantive” significance, was offered by McCloskey and others in the past, from the economics perspective. This critique essentially combines the two, in the sense that substantive and statistical significance are two sides of the same coin and cannot be considered in separation. In context of “causality,” the critique is twofold. First, what we are using as tools to investigate “causality” are not exactly investigating causality, but conditional effects, which may or may not be causality. Second, the conditional effects, whether they capture the causality effectively or not, is far more interesting and useful but we lose track of them because we are looking too hard for “causality.”)
Statistical investigation of “causality” is growing in importance, but the more I think about this, it less apparent it seems to me that it is necessarily capturing “causality” per se, although it does recreate, up to some point, certain aspects of laboratory experiments — but, from the analogue of chemical reactions, that is not quite a search for “causality” as such. It is, more properly, a search for mechanisms — something closely related to causality, but subtly different, and I think, much more applicable in a broader array of settings, but only if we think about the problem a bit differently.