# Gambling and Noise

I have long had a model of how a mediocre baseball team can set up rotation to maximize its chances in a short series. The logic goes somewhat contrary to the usual ideas in economic theory about attitude towards risk: basically, the worse a team is, the better it stands to do if it tries to maximize the variance — or, take as much chance as it can.

A simple example illustrates this. Suppose one team is very good and one team is very mediocre. They play one game to determine their fate. For the sake of convenience, let’s assume that the good team always gives up exactly 2 runs a game — i.e. zero variance. On the other hand, the bad team gives up, on average, 4 runs a game. However, the bad team can strategize by adjusting its variance. It is elementary to see that, if the variance is 0, the bad team always loses, 2–4. But we can imagine a scenario where the bad team either gives up 0 runs or 8 runs, with exactly equal probability. The mean is still 4, but now, the bad team gives up 0 runs exactly half the time and thus wins. The catch is that this is the best that the bad team can do — and scoring 0 and 8 with equal probability also maximizes the variance given that you cannot score negative runs.

This is hardly a new discovery: the formal work by people like Stiglitz that showed how a perfect market for credit is logically impossible (and the tradition goes back to pioneering work on information in economics by Hayek and Keynes.) is predicated on this realization, at least in part. The borrowers can always “win” by taking more risk, as long as they are not playing with their own money, and the poorer the borrower, the more they stand to win by taking even bigger risk — even though they, on average, will lose on their bet. In a sense, this also underscores the logic of “institutionalization” and “bureaucracy,” whose role is largely oriented towards reducing noise and making the process and the output predictable: the “better,” “stronger,” and the more “established” players are going to win more than not anyways, under any circumstance, but, the more orderly the rules of the game are and the less noisy the outcome, the better they stand to win consistently. If the game is to be contested unpredictably, on unexpected playing fields, something odd and unpredictable happens and, even if improbable, the weaker, but whackier team wins.

More to follow when I have my train of thought better flushed out….

PS.

We might consider the way “causality” is captured by instruments as something like the following (essentially reprising the usual diagrams the way I feel comfortable).

The population consists of multiple types that may not be observable— let’s call them T and H for convenience.

In state 1, some process X exists that affects T and H differently — say, at different probabilities (as per selection effect).

X: T → T’ at p = pt

X: H → H’ at p = ph << pt.

So the effect of X on both T and H should be T’ — H’ (or, if the p(effect) is small, p*(T’-H’) + (1-p)(T-H)) but simply examining the outcomes gives us [pt*T’ + (1-pt)*T ]— [ph*H’ + (1-ph)*H], which, for increasingly larger |pt- ph|, will be farther and farther from the T’-H’ that we seek.

An “instrument” transposes the interaction to a different setting, say state 2, where T and H are affected by a (potentially) slightly different process, X2. In principle, state 2 could be simply a subset of state 1 where X is “known” to operate differently from the state 1–2, i.e. the rest of state 1. If the instrument is “perfect,” the probability at which T and H are affected in state 2, ph2 and pt2, will be exactly the same If the instrument is merely “good,” only |pt2-ph2| <<|pt-ph| would hold. The process X2 would give us the following:

X2: T →T’’ at p = pt2

X2: H → H’’ at p = ph2.

So we wind up seeing [pt2*T’’ + (1-pt2)*T] — [ph2*H’’ + (1-ph)*H]. This will give us something close to T’’ — H’’, which will tell us that something affects both H and T differently, if pt2 and ph2 are close AND are reasonably close to 1. But this does not “really” tell us how the reaction would play out in state 1: unless we can make the assumption that X2 and X1 are, again, fairly similar, the estimated effect in state 2 cannot be applied directly to state 1. It would only tell us that the effect is not just driven by the selection effect.

Of course, this is not just a statistical problem — this is a common problem in actual lab settings, too. Something “works” (or doesn’t) in vitro, but they don’t (or do) in vivo. The difference in X1 and X2, rather than simply that there is, in a fairly small subset of cases which in practice may not be easily identifiable, evidence that there is a evidence that the difference in effect seen for experimental and control sets is not just the product of selection effect (what commonly passes as “causality” in many applications). For all practical considerations, what we really want to know is not “causality” per se, but the difference in the states in which the reactions take place.

I’m going back to my chemistry roots again: I’ve just described, in slightly different words, how catalysts work, as facilitator of (some subset of) reactions that take place rarely without their presence. While the presence of catalysts might indicate that, indeed, carbon monoxide and nitric oxide do and can react to form carbon dioxide and nitogen, using it as a tool to investigate “causality” seems a bit of misuse of the conceptual framework. All that the reaction in presence of, say, palladium, shows us is that the reaction 2CO + 2NO → 2CO2 + N2 happens, or can happen somewhere somehow (hardly evidence of “causality”). Not that it happens without the presence of palladium (maybe), in presence of nickel (not really), or in presence of platinum (yes), or more accurately, the reaction provides very little information on how much of the reaction one should expect to take place in these different settings (the more important and relevant information than the “causality” that isn’t quite). (NB: In this sense, the causality obsession has vague analogue to the p-value obsession: the assumption of “exactly zero” effect for which p-value is computed is a strawman, as Frank Harrell points out. Far more useful to calculate probability that the effect is outside some useful reference value. To continue with the catalyst analogy, it’s not so much that the catalytic converter reaction takes place at exactly 0 rate in absence of a catalyst, only that the rate is so low that it is uninteresting, while, in presence of certain materials, it is taking place at meaningful rates. A different critique of p-value obsession, in terms of contrast between “statistical” vs. “substantive” significance, was offered by McCloskey and others in the past, from the economics perspective. This critique essentially combines the two, in the sense that substantive and statistical significance are two sides of the same coin and cannot be considered in separation. In context of “causality,” the critique is twofold. First, what we are using as tools to investigate “causality” are not exactly investigating causality, but conditional effects, which may or may not be causality. Second, the conditional effects, whether they capture the causality effectively or not, is far more interesting and useful but we lose track of them because we are looking too hard for “causality.”)

Statistical investigation of “causality” is growing in importance, but the more I think about this, it less apparent it seems to me that it is necessarily capturing “causality” per se, although it does recreate, up to some point, certain aspects of laboratory experiments — but, from the analogue of chemical reactions, that is not quite a search for “causality” as such. It is, more properly, a search for mechanisms — something closely related to causality, but subtly different, and I think, much more applicable in a broader array of settings, but only if we think about the problem a bit differently.

Like what you read? Give Henry Kim a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.