A model-first mouse

Action-first vs model-first

Synthetic Intelligence
Synthetic Intelligence

--

There is a broadly ignored aspect of reinforcement learning (or action-first approach in general): it’s fundamentally less efficient from a computational perspective compared to the utilizing algorithms based on a model of the world (model-first approach to be short, but keep in mind the word ‘model’ used only in the context onward).

It’s pretty hard to compare the two methods directly, as to do it, you have to first convert the model-first algorithm to an agent: define the goal. Also, any action-first agent has some representation of the world, even if it’s just a simple policy that describes the local rules.

The easiest way to see the difference is to assume that in both cases the needed part of the environment is completely observable and the action-based agent creates the model of the environment by representing it in the same way as the competitor does (but from the action-oriented point of view). Also, let’s consider that the model of the world is not learned at the starting point and it’s uniform, meaning it is composed of similar elements everywhere (which is close enough to the real-world situation).

The model-first agent starts with the simplest accessible regularities in the environment and will shortly have representations of the composing elements of the environment, where the goal is one of the environment features and the path is another one (if it’s not true, reaching the goal in such an environment is not possible). Finding the optimal path after building the model of the environment is virtually one step (it depends on the details of implementation).

The action-first agent starts with searching the full path to the goal. Basically it’s a permutation of all possible actions and the environment constraints, which are numerous in all non-trivial environments. Also, a model of the world that is built in this way is based only on what was explored on finding the first path, so there is no information about the optimality of this path. To accomplish this, the agent has to continue the random search, which will be more and more efficient with every next attempt because of improving the description of the environment, but it’s still a very time-intensive method.

Definitely, a lot depends on the implementation details and specifics of the given environment and goals. It’s possible to imagine some edge cases, where the action-first approach can be much more efficient. However, for the close to real-world scenarios, there is an obvious advantage of the model-first approach. Also, the most productive efforts of improving the action-first agent lead to improving its representation of the environment.

It is also clearly demonstrated by evolution, where the model-first approach definitely receives more and more adoption since the moment when creating an abstract model became possible with the mammalian neocortex.

Take two mice, make one of them action-first by removing the neocortex (it’s hard to believe but such experiments have actually been conducted), and you’ll see how the downgraded mouse is in constant action, spending a lot of energy, risking, and taking advantage only from local opportunities, while the model-first mouse, even its model is quite modest, takes a strategically beneficial location, observing from there opportunities and threats, spending energy mostly on updating its model of the world.

The same grounding is behind the quite advanced model of the world in our own skulls. Even though it takes about 20% of all our energy and makes us more vulnerable, it’s also the foundation of all human achievements.

--

--