The Counterbalance and Benefits of Machine Learning

Jordan Kyriakidis
Geek Culture
Published in
7 min readJun 23, 2021


Machine learning is here to stay; its use will continue to penetrate all sectors of our infrastructure. It will also be largely invisible and the danger is that so too will be our reliance on it. Some people welcome this with open arms, others consider it the harbinger of our eventual Armageddon. The truth, as with many such complex and difficult matters, is that both sides are correct.

The interesting question to my mind is not whether this ought or ought not to happen, but rather what shall we do about it, how can we both rely on and limit its influence, and where should we do so.

In order to even begin understanding how to approach such questions, I think it’s beneficial to understand when a machine learning approach is most useful and when it is not. For those cases where it’s not useful, the answer is often not the status quo, but rather in a replacement or complimentary technology¹ that admits the benefits, but prevents the disasters.

Machine Learning is great when you care about what, not why

A great, possibly innocuous, application of machine learning is in recommender systems. These are engines, based on lots (a lot!) of data, that give you sometimes magical suggestions on what book to read, or what show to watch. For systems like these, the most important metric is whether the recommendation is a good one or not. What sorcery went on behind the scenes is less important² (to you) than the efficacy of the results. Why the system gave me these results is less important than whether the results are good or not.

Another example comes from a professor we were working with some time ago; at that time, we were considering how we might use machine learning to enhance some features of QVscribe. The TL;DR version is that we wanted to figure out how to use machine learning to analyze texts of requirements or RFPs³ to evaluate the risk for the project going over budget or over time. He mentioned a possibly related project he was previously working on with a student. They were trying to analyze annual reports of public companies in order to predict whether their future stock price would increase or decrease relative to the previous year. You can imagine that many companies have a good idea if they’ll be in better or worse shape in the coming year and, in their annual report, they’ll want to always sound positive. But you can also imagine that there are clues that a well-trained — machine learned! — eye can detect. Now, if I’m a betting man, and all I care about is prediction, then this sounds great! But, for assessing risk in a requirements or RFP document, simply predicting disaster is not sufficient, even if our predictions are correct. To be useful, we need to say, at a minimum, why the document is poor and really we ought to propose corrective actions. For this, machine learning is not as useful. Typically, for natural language processing type tasks, the “clues” are things like relative groupings of words, or even particular syllables that appear together as the end of one word and the beginning of the next. In other words — pun intended, ahem — machine learning often uncovers and exploits correlation to make decisions or predictions. Many times this is exactly the right thing to do. Other times, particularly if causal relationships are what you seek, then machine learning is less beneficial.

Correlation vs Cause-and-Effect

Here’s another more mundane example to distinguish between causality and correlation. I want to know if it’s going to rain today. To do so, I look outside my window to see the people walking on the busy city street.⁴ I notice that whenever I see people wearing rubber boots or holding umbrellas, it is much more likely to rain, so I grab my raincoat whenever I see umbrellas and rubber boots outside my window. All is good. Now, if what I really care about is the cause of rain, then I shouldn’t think that people wearing rubber boots cause rain. Those two events are definitely, and strongly, correlated, but to understand what causes rain, I need something much more. I need a model of rain, I need deeper concepts, and relationships between these concepts. I need meaning.

In short, it’s ontology and semantics on the one hand vs statistics and machine learning on the other. Uncovering causal relationships is far more difficult than uncovering correlations. Particularly when the data is unstructured, deducing causal effects — your RFP has high risk of project failure because of these particular problems — is still largely in the domain of human expertise. Tools are getting better, but it’s a much more difficult — and more valuable — problem to solve.

Machine learning is dangerous when why is at least as important as what

Machine learning needs data. A lot of data. It needs data because it is essentially a statistical exercise. A lot of data, a lot of computational power, and the task is to find patterns. Find the patterns, and you have predictive capability.

But if the data is skewed — and for anything of any worth, things are almost always skewed and messy — then that skewing is amplified and baked into the results of machine learning. And this can be bad precisely because the reasons why a machine learning algorithm predicts something is not generally known. Data is often skewed because values are many times inadvertently introduced in the corpus of data. These may be be unspoken implicit assumptions that may be obviously true in some contexts, whereas demonstrably false in others.

So, for example, an algorithm may try to predict where crime will take place, but what it actually predicts may be where arrests will likely occur. Crimes and arrests are not the same thing at all. Predicting crime may have the semblance of objectivity, but conditioning those predictions on arrest data has all sorts of social prejudices and asymmetries baked in. And that can make a terrible problem even worse. Again, ontology vs statistics. Values matter.

Value-free is meaning-free

An objection might be that we need to create a “good” corpus of learning data. But this is extremely difficult to do; it is not known (yet?) how to do so in general. And often it is not even the right thing to do — or as physicists are fond of saying, “it’s not even wrong.” Values create biases; they favour one thing over another, they elevate certain subjective criteria over others. But values are not bad. Values are necessary. Without values, there really is no meaning. If we value meaning, we need values. This is an important point that I think too often goes ignored. If all you care about is correlation — or what things appear together as opposed to why things appear together — then you can rely on a purely statistical analysis on a “good” data set. By “good” we mean a data set free from human values or biases. This is difficult but critically important. It is important because any biases (values) included will be amplified by the technology. This is a kind of corruption that is almost always bad.

Alternatively, if you want to uncover causal relationships, if you are after why instead of what, then you really need a model, or a theory of the world. And today, at least, computers are bad at building models. But humans are quite good at that. When you build models, you absolutely want values. You want to say certain things are to be valued over others. The great successes of physics over the last century — and they truly have been transformative to society — is really because we have become quite good at building models of the world. We choose certain items, or degrees of freedom, to be important, and others we ignore or marginalize. We do this on purpose and very explicitly.

All things in moderation

The key is to paint machine learning as neither wholly good nor wholly bad. You look at machine learning as a power tool that can be put to good use, but not as a universal salve. You need a counterbalance.

The counterbalance, is a computational absolutism. There are certain things that the system must not ever do. There are certain situations that, if ever encountered, must result in very specific behaviour, in a very specific time frame. In short, the machine must do the things it is supposed to do, and not do the things it is not supposed to do.

This is a very different approach to machine learning, and it is very complimentary to the more common neural nets. In the critical safety community, and to some extent the semiconductor industry, this goes by the name of Formal Methods.

In the next post, I will delve into this complimentary approach, where certain properties and behaviours are explicitly mandated (or expressly prohibited). I’ll attempt to describe how the two approaches together are really what is needed in order to build autonomous systems that are safe, secure, and robust.

[1] Another very important axis along which to investigate counterbalancing machine learning, is the social and social policy axis. I won’t discuss these here; there is already lots being written about this, by very qualified people.

[2] I don’t mean issues of data privacy, here. That is always very important indeed. Rather, I mean the particular features the system used to produce the recommendation is of secondary importance to the recommendation itself.

[3] Request for Proposals: a tendering document, sometimes very detailed, outlining what a government or large company wants to do or buy.

[4]Actually, outside my window is a forest and a river, but still.