Information Leakage Can Be Measured at the Source

Allison Bishop
Proof Reading
Published in
6 min readJun 20, 2023

TL/DR: Proof has put out a new whitepaper on defining and measuring information leakage. The main idea is that information leakage can be detected and controlled in behavior patterns directly, rather than looking only at its noisy impact on price. This is joint work with Arthur Américo, Paul Cesaretti, Garrison Grogan, Adam McKoy, Robert Nicholas Moss, Lisa Oakley, and Mohammad Shokri, many of whom are part-time quantitative researchers with Proof. Stay tuned for ways that this will be incorporated into our products!

Every now and then, mathematicians get to see themselves portrayed on the big screen. We get Ian Malcolm’s plausible recitations of chaos theory and inability to stand still while waving a flare, Katherine Johnson’s calm displays of overwhelming competence, and Russell Crowe being … well, Russell Crowe, but with math!

But data science has yet to achieve such iconic moments. I have this fantasy that someday a movie will show a gangster pressing a suspected rat against the wall saying: “Ever since you joined, Tony, busts have been up 10%!” Then Tony says back, “yeah, but if you look at the distribution of busts over the preceding 5 year period, there have been many examples of busts being up by 10%, and nobody got blamed for those!” The gangster let’s Tony go, but he warns him: “Ok, I’ll let it slide for now, but I’m updating my Bayesian prior on you!”

This probably won’t happen, but it kind of should. Statistical reasoning is relevant to a lot of decisions we make in our daily lives, but humans are notoriously bad at applying it properly (statisticians included). Especially fraught is the assignment of blame or attribution. Is Tony really an informant, or is it a coincidence? If what’s happening now is something that happened reasonably often before Tony, we can’t have much confidence either way.

An easier question, though, is what a real informant should do to stay under the radar. They shouldn’t, for example, trigger a pattern of events that has never happened before, unless it’s the final showdown at the end of the movie. Before that point, they should try to nudge things to make their preferred outcomes more likely, but only subtly so.

The decisions facing an informant in this scenario are somewhat analogous to the decisions faced by an institutional trader who wants to buy or sell a large quantity of stock. The actions they take will potentially push up certain metrics — like causing more volume, more quoting at the bid, etc. If an adversary who is observing market activity records a measurement that is relatively unlikely to occur otherwise, they may be able to infer the presence of a large buyer/seller in the market. Once a large buyer/seller is detected, other traders may push prices around to the detriment of the exposed trader.

This is one example of information leakage and how it can hurt institutional investors. Institutional traders are rightfully wary of such phenomena, but their suspicions and measurements of information leakage often revolve around price: “we did X and then the price went up! Did we leak too much information?” While such questions are very natural (price is, after all, the thing that we care about!), they are not necessarily the best way to detect and avoid unnecessary information leakage. There are many things that cause prices to go and up and down, so price is a very noisy metric. Looking at price alone also means that we cannot pre-emptively catch leakage that may be dangerous before it is exploited by other traders.

If we want to detect and measure leakage at its source, we can instead think like an adversary. What kinds of thing might we look for it we are trying to detect large buyers or sellers in the market? We might look, for instance, for unusual levels of volume, or imbalances between the NBB and NBO, or a telltale signature of an aggressive router returning over and over again.

For whatever we might measure as an adversary, there is some general distribution to what our measurement yields in ordinary market conditions. We might think of this distribution as “the world without the big trader” [Aside: this isn’t quite true. It’s more like the “the world with the average amount of big traders in it.”] Now, when a particular big trader is active in a symbol, the distribution of the measurement may change:

We might think of this new distribution as “the world with the big trader.” The adversary’s goal here may be to find a measurement for which these worlds are as distinguishable as possible, at least on some range of values, so they can trigger an action in one case more than the other:

Stepping back to the institutional trader’s point of view, we want to avoid this kind of trap. So what can we do? Well, we might try to keep the adversary guessing by keeping the probability of each outcome relatively close between the two worlds:

If we’re able to do this, it means that anything the adversary might do based on this measurement in the world with our trading, was going to happen with a similar probability in the world without our trading. In other words, we are protecting ourselves from bad things happening solely or mostly because of our activities.

Using this approach, we can identify some metrics that adversaries might use, measure general market distributions for them, and then try to design our trading activities to keep these distributions within specified bounds. We should not expect to anticipate all of the metrics that an adversary might use, but certainly controlling our leakage on some fronts is better than controlling it on none. This approach can be complementary to approaches based on price, as it can be pre-emptive rather than reactive, and also can reduce noise by considering less noisy metrics than price.

The details of this are quite technical, but you can see some proof-of-concept examples in our new whitepaper. We also have some ideas for how this can be incorporated into products:

· Pretrade analytics: If pretrade models are being used to decide, for example, how to break up a large trade over more days to avoid large anticipated costs, it may be useful to additionally consider how the accumulated leakage over days can be controlled in distributional terms. Especially if we are modeling leakage through features that are less noisy than price, there is reason to believe that such multi-day calculations could be more stable and meaningful than extending price-based models across days.

· Coordinated trading across symbols: a framework for measuring joint leakage of trading activity across several orders at once could be used to monitor accumulating leakage in real time, and we could re-budget across symbols dynamically as they trade. This could operate, for example, as an overlay over trading algorithms that operate only within each symbol. The overlay could adjust the parameters of the underlying individual orders to stay within overall leakage goals.

· TCA: One could also apply a version of this framework after the fact and compare measurement distributions resulting from past trading behavior to comparable distributions in the market generally. It would be interesting to see, for example, if orders that ended up violating generous leakage bounds experienced worse pricing than comparable orders that did not. This could be evidence that the metrics being tracked do result in exploitable leakage.

We’ll be working on developing these potential applications, and would love to hear any feedback in the meantime. Or any data science movie ideas.

--

--