3 hints AI gives us about life

How to perform stochastic gradient ascent towards your goals

Shu Ishida
Tech to Inspire
7 min readJan 30, 2021

--

While maths, algorithms and data science can seem abstract and indifferent to our day-to-day worries and woes, I find it amusing how they can sometimes provide a perfect analogy to the problems and lessons we encounter in life. Here, I selected three observations that we can make from practices in machine learning and planning, which may give us hints on how we, in turn, can navigate the complexities of life to reach a near-optimal solution.

(Disclaimer: I have no intentions to claim that I know how to navigate life effectively — on the contrary, my weights are under-tuned and my gradients are steep, so please enjoy this just as a casual read! Also, I am using the term “AI” fairly loosely here.)

An objective function with optimisable parameters (Source: Andrew Ng, Coursera)

1. Learn from the past, but don’t get caught up in it.

After all, machine learning is all about learning from data — data that has been collected in the past. Unlike something like expert systems, which has built-in fixed conditions for decision making that cannot self-improve, the systems that employs machine learning can continually learn and improve with more data, adapting to possible changes in the environment or the application domain.

Analogously, we humans also learn from past experiences. Through trial and error, we collect experiences, update our knowledge and beliefs about the world to best explain past observations, and adapt our behaviour policy accordingly to maximise our expected future outcome.

We learn history because there are recurring patterns in historical events and humans tend to make similar mistakes, and we might be able to harvest valuable insight by studying them. In reinforcement learning (RL), one effective technique for improving performances is “experience replay”, in which the algorithm keeps past experiences in a memory bank and keeps retraining on it so that the learning won’t be forgotten.

So we know that learning from the past is useful and important. However, learning from the past alone is not enough. That is the biggest distinction between supervised learning methods that passively train on a fixed collection of data, and RL which actively collects new experiences. An effective RL algorithm should have a healthy mix of trying things that worked before (exploitation) and trying out new things (exploration). It is like how you might try out different dishes at a restaurant. On every visit, you would include some dishes that you know you like so that you are guaranteed to have an overall satisfying meal, but you would also order something different in the hope that you might discover a hidden gem that you can add to your list of favourites. This is actually a Multi-Armed Bandit (MAB) problem, which is a subset of RL.

Not totally depending on your past experience is important for another reason, which is that things can change over time, and we have to adapt our behaviour to the new environment. In the context of RL and MAB, this is called a non-stationary reward problem. Maybe the chef of the restaurant has changed, in which case, we could no longer rely on our past experiences.

We humans are heavily governed by our past experiences. Sometimes we get caught up in our past, maybe because of a single traumatic incident, which affects every decision we take from then on. However, that cautious approach might prevent us from discovering a way forward, trapping us in a local optimum. Whilst we cannot and should not explore the entire state space, some of which have severe consequences with no paths leading back, it is good to get out of our comfort zone, maintaining a healthy balance between exploration and exploitation, and not let ourselves overfit to our past experiences but to learn a more general policy.

2. A greedy algorithm isn’t always optimal, but with a good heuristic function, you can navigate efficiently.

Have you ever regretted your choices? It is easy to be wise in hindsight, since now we have obtained new information that wasn’t available in the past. Looking back, it all seems so obvious what the correct decision would have been.

But is that really the case? Unless you are at your death-bed (in which case you probably wouldn’t be reading this anyway), you still have a long time to make up for your past mistakes. In fact, we never know when it is time to say, “in hindsight, this was the correct thing to do”. Time goes on, and what seemed right can turn out to be wrong, unforeseen opportunities may arise, and what seemed like a big mistake can help us avoid bigger failures, provided that we learn from our mistakes.

There are two classes of planning problems — one where the environment is fully-known and the other where it is only partially known. In our lives, we always have to deal with the latter. We never know whether job opportunities will remain open, whether the stock prices will go up or down, whether the feeling you have towards that person is (or remains) mutual, or if life will go back to normal any time soon.

In a partially known setting, the best we can do is to make the best possible decision given what is known at present and what we estimate will happen in the future. A good heuristic function will guide us to make decisions that are good in the long-run, and to avoid unwanted consequences, when other “greedy” decisions with immediate rewards might seem more appealing in the short-term. Every failure and mistake we make, every unfortunate circumstances give us an opportunity to reconsider and update our heuristics, giving us a better chance to make a better decision in the future.

What is so miraculous about humans (and other species which communicate via language) is that we can transfer our knowledge and wisdom with words, allowing others to update their heuristic functions without them having to go through the same unpleasant experiences. This is somewhat analogous to parallel policy updates in reinforcement learning where millions of agents are spawn, each collecting experiences individually and then communicating how the policy should be updated.

So don’t regret your past choices. Instead, use those gradients to update your policy so that you won’t make the same mistakes, and maybe that will guide you to a state that is much better than where you would have ended up otherwise.

3. Diversity and combined effort made neural networks successful.

What made neural networks so successful as a machine learning method? A lot of credit goes to the insight that deep networks can approximate more complex functions, and the back-propagation algorithm which made parameter updates efficient even for deeper networks. But the other dimension — the width of the network, i. e. the number of channels —where features are learnt and detected, is also important. The more channels a layer has, the more diverse the features it can recognise becomes. Each channel is different from another. They diversify so that it can cover the entire spectrum of relevant features that are useful for subsequent layers.

Dropout, which randomly masks out some outputs at intermediate layers, encourages robustness in the feature representation such that, even if one feature is missing, the other features can make up for it.

We are also very diverse. We have different backgrounds, interests, opinions and lifestyle. While such differences present many challenges that we must overcome, such diversity is also our fundamental strength. Imagine how boring and dysfunctional the world will be if everyone likes to do the same job, has the same food, lives in the same place, and shares the same belief (which may or may not be true). While it would be easy to satisfy everyone, there will no longer be any specialties, and it would not be possible for us as a species to explore the entire spectrum of human capabilities. The lack of diversity will also make us vulnerable to certain types of risks.

Channels in neural networks are very good at diversifying and specialising, but also at cooperating. In a typical neural network architecture, a layer receives outputs from all channels in the previous channel, and each of them decide independently how much they want to listen to each previous channel. Whilst some channels might be more useful for certain types of input more than others, every channel has its contribution. The advantage of diversity and cooperation is clear from the fact that ensemble methods always perform equally or better than any sub-component.

It is worth noting here that, while these channels are diverse, they all share one thing — combined together, they all share a unique goal of optimising the objective function. They are not merely trying to be different from each other. Although they start at randomly initialised states, looking at different directions, once they start their gradient descent, they all start work towards the same goal, but each following their own paths, improving upon their specialties. I think this is a beautiful thought.

We, human species, have not yet agreed upon a single objective function we are optimising. It is unclear whether there is even such a function in the first place. However, if there is such a thing, my hope is that we can all play a part in our own ways, using our own strengths, whilst valuing each other’s contribution and accepting their differences, taking one small optimisation step down the gradient towards a brighter and kinder future.

Thank you for reading this article. If you liked it, please have a look at a related article below where I translated 25 proverbs into AI context:)

--

--

Shu Ishida
Tech to Inspire

DPhil student at University of Oxford, researching in computer vision and deep learning. Enjoys programming, listening to podcasts, and watching musicals.