Machine Learning: The Real “Theory of Everything”

Published in

SkyshiDigital

5 min readFeb 1, 2018

Have you ever watch “The Theory of Everything”? It’s a biography movie of Stephen Hawking and his journey to discover a formula which could explain every single phenomena in the universe. If you haven’t watch it, watch it, it’s a cool movie. Don’t worry this movie is not as nerdy as it sounds, so even if you are not nerd enough to understand Hawking’s books, you still able to enjoy the movie. When I was watching this movie, my imagination was running wild and somehow I found something interesting, which is the similarity of intuition between Hawking’s “Theory of Everything” and Machine Learning. In this article I would like to try to explain the basic of machine learning using “Theory of Everything” perspective.

In the movie, Hawking believes that the universe work based on certain rule or formula that human doesn’t know yet due to limitation of observable phenomena and every known formula is just small part and could be used to approximate “The theory of Everything”. From my perspective, this belief could explain how machine learning actually work ideally which is by approximating formula (the theory of everything) that could explain every single output for every input (universe) by deriving formula (small part of formula) from our dataset (observable phenomena).

Machine Learning works by approximating formula (the theory of everything) that could explain every single output for every input (universe) by deriving formula (small part of formula) from our dataset (observable phenomena).

So for example we have dataset in form of stock price for certain company in the last quarter. Some people might think that the stock price is random and it is impossible to find a pattern to explain and predict it even with the help from the best economist on earth. But, if we put Hawking’s perspective in stock price analysis, we would never say that there is no pattern, or random, but we would say that there is pattern but it is just humanly impossible to observe that pattern. Since it’s humanly impossible, this is the time for human to step back and let machine do their job which is to find that humanly unobservable pattern. In machine learning, the machine find the pattern through iteration of trial and error called training. To put it simply, training is just a process to match every possible pattern with dataset and stop when the pattern match all data in the dataset. Figure 1 illustrate pattern matching for 3 iteration in machine learning.

At this point, you might wonder how can the pattern (the graph) in the first iteration change into the pattern in the second and so on. Well, on the picture above the pattern change randomly each iteration for the sake of simplicity. Unfortunately, in real implementation, if the pattern change randomly, even though it takes less computational resources, it is not a reliable practice because it might takes significant amount of time just to find best-match pattern for simple data. If in his research, Stephen Hawking just make random formula and test it to every phenomena, even if he lives for hundreds years he might not find the formula he’s looking for. Fortunately, our Stephen Hawking is smart and time efficient, so in his research he follow certain guidance and rule to extrapolate the formula. This guidance and rule is called update rule in machine learning The implementation of update rule will ensure that every iteration produce pattern which match our dataset better than the previous one by minimizing convergence error.

Just because the convergence error is very small or even 0, it doesn’t mean that our model capable to 100% predict accurately future stock price

What is convergence error? Well, it is just fancy way of saying accuracy of our pattern, compared with dataset. If convergence error is big it means that our model failed to explain most of dataset and if it’s small it means that our model capable to explain most of dataset. It’s worth to be noted that just because the convergence error is very small or even 0, it doesn’t mean that our model capable to 100% predict accurately future stockprice. Small convergence errors only means that our model will accurately predict the past stock price given any input INSIDE the dataset. So if we give an input, OUTSIDE the dataset our model might not be able to predict it accurately. So what’s the point of machine learning if it can only predict accurately data inside the dataset, not the unknown data that we’re trying to predict? Let’s take a step back and put “Theory of Everything” perspective in the context. Remember that indeed, all currently known formula can’t explain all phenomena like “Theory of Everything”, but since we are limited by observable phenomena and can’t formulate the theory of everything yet, known formula is enough and could be useful in real world. In our machine learning model, even though it can only accurately predict the data inside the dataset it’s enough because it is approximation of stock prices’ “the theory of everything” and able to predict some future stock prices.

If we want to increase the accuracy of our model to predict based on input outside the dataset, what we need to do is increase the quality and quantity of the dataset used in training process so the machine can get closer approximation to the stock prices’ “the theory of everything”. Treat dataset like physician treat observable phenomena. Just like, the more physician observe a phenomena the more they understand universe and able to derive certain formula from that understanding, the more dataset we use in training the more our machine can understand and create a model based on our dataset. A good ML model is model that is created from combination of good dataset and small convergence error. Good dataset with big convergence error is as terrible as bad dataset with small convergence error, because in both scenario the machine will not be able to create general pattern (model) that can be applied to unknown data. Small convergence error can be achieved through proper technique implementation by understanding your necessity, resources and dataset. While good dataset is dataset that represent real environment that model need to face, so you need to consider not only the quantity of dataset but also the quality such as variance, label proportionality(for classification problem) etc.

Machine Learning is a very vast ocean to be sailed and it is getting vaster over time due to a lot of research on this subject. What I present here is just a simple picturization of ML using “Theory of Everything” as analogy. If you want to learn further regarding machine learning, these are several learning source that I recommend:

Andrew Ng Machine Learning Course
Hands-On Machine Learning with Scikit-Learn and TensorFlow by Aurélien Géron
Recommender System Handbook by Ricci et al. (If you want to dive deeper to recommender system)

If you have further question or want to discuss about machine learning, you can send me an email at himang@skyshi.io or preferably come to my office at PT Skyshi Digital Indonesia since I would have justifiable excuse to procrastinate if you come :). See you.

Machine Learning: The Real “Theory of Everything”

Written by Himang Sharatun