A Model Is a View of the World

Daniel Shank
Talla
Published in
5 min readNov 14, 2018

With machine learning and AI becoming increasingly common in tech and the business world in general, we are starting to hear the word “Model” thrown around a lot. New machine learning models often hit the news accompanied with fanfare and amazement at their capabilities — we talk about training models, improving them, and developing new ones. But what are models? Despite being one of the key concepts in statistics and machine learning, models are poorly understood by most people. From context we can see that models are used to predict things, generate images, and make decisions, but what actually are they? If we understand what a model is, even without knowing much about statistics we can guess when they are going to fail and make even better use of them.

Simply put, a model is a view of the world. At its core, a model makes assumptions about data, and how different variables relate to each other. An example of this is the preconception that the sun rises every morning and heats the earth. That is a kind a model, one that relates the temperature of the earth to the presence of the sun in the sky. In that view of the world, the light of the sun and the heat on the earth are connected. The model can be expanded and made more complicated, where the amount of light is really important, and the clouds, humidity and other factors all combine to heat the earth a particular amount.

A model in a data science and industry context might assume that page views on a website every month are related to a variety of other factors, such as the occurrence of an email campaign, quantity of sales calls, and the amount spent on paid search advertising. One variant of this model would be one in which views are related to these inputs (or features in machine learning parlance) through a weighted linear relationship. Every sales call might result in 0.2 website views on average, whereas paid search gives 0.1 view per dollar. The numbers are a part of the model, flexible components called parameters. A more fixed aspect of the model is the plain fact that every feature impacts the result in a linear fashion, independent of any other feature. In this view there is no synergy between an email marketing campaign and a sales call. They both contribute in their own simple way independent of all other factors.

Making and validating model assumptions is actually the principle job of science. That the speed of falling objects is not dependent on their weight but only on the length of time they have been falling is an assumption of the model that describes classical mechanics. We can test this assumption by dropping objects of different weights and measuring them — though like our simple linear model of page views, there are flexible components that need to be determined — in this case the gravitational constant that says by how much a falling object accelerates per second. Our end goal is to accurately describe and predict how different kinds of falling objects act under various circumstances.

We can also think of a model as a machine that that spits out predictions given certain inputs. The hope is that our model will give us predictions that line up with what we see in reality. But pursuing this objective is often difficult, even with a fairly simple model. To further complicate matters, there are many models that we know are “wrong” fundamentally but that give very good results in practice.

The logistic regression (or “maximum entropy” model) is one method for performing text classification in Natural Language Processing. The logistic regression is a model similar to the linear relationship we used to describe website views. Instead of ad campaigns as features, however, in NLP we often use the presence of words (or combinations of words) to produce a prediction. We can often ignore the order of words completely and only consider the frequency in which they occur to produce a reasonable prediction. Sports terminology is a good predictor that a document is sports related. This is true, even though we can see clear cases where this is not the case. The sentence “This is not about football” is liable to be classified as sports related even though it self-evidently is not. Though language is complicated, we are able to get close to our objective by assuming a simple model and making it fit as best we can.

The “Why” question can be very hard to answer when it comes to machine learning, and this is the reason. Though the example of a logistic regression is a wrong but “simple” model, there are also many wrong and complex models. These models can still be accurate, and we put our faith in statistics when we use them in making predictions. There are many differences of opinion on the point of how much of modeling is pure statistics or mathematics, and how much is explaining things that we see. In the sciences, the goal of understanding data is approached through finding mechanisms of action; stories that describe the moving parts of a system that can also predict its behavior. If we can verify how all the inputs of a system relate to each other, we have all the more evidence for what the ultimate result will be.

Models are all a view of the world, but that view may be overly simplistic or utterly crazy. Deep Neural Networks, now a dominant paradigm in machine learning, also contain a view of the world. But that view is a strange one in which the relationship between your input and your output is dependent on millions of parameters that determine the specifics. You might hear someone use the phrase “The model thinks…” when describing the predictions of a statistics or machine learning based system. This isn’t completely bad, but you have to remember that the point of view such a system takes can be completely alien to our own. Even systems that give just as good or better predictions than a human don’t usually break down the problems they solve in the same way. Even when they can be explained, the view of the world that such a system takes are often at least subtly different from what we think of as the obvious reality. We judge machine learning systems not only by the results they produce, but also by how much sense the algorithm makes. In the end we can always be results oriented: A crazy model that works is still a model that works — we should just have realistic expectations of what the model can really explain.

Click Here to Request a Demo of Talla

--

--