Timeline and analysis of existing attempts of recursive self improving (RSI) software systems

Robert Wünsche
Aug 9, 2017 · 6 min read

Only implemented systems are part of this article. Just theoretical architectures such as the Gödel Machine[goedel1] are not covered because it hasn’t been implemented.

The author assumes that the reader is familiar with the basic motivation and prospects of RSI systems.

What are Self-Improving algorithms?

Self-Improving algorithms must have the property to get better over time of solving a problem[Yampolskiy1 page 3]. Common examples of this behavior are neural networks and evolutionary algorithms. Not so common algorithms are sorting and clustering algorithms[si1].

What is Recursive Self improvement?

Recursive self improvement is defined as the process when the system can get better at getting better[Yampolskiy1 page 4].

RSI software systems can change their representations of most or all algorithms of the A(G)I system. The changed representation must include the search algorithm for the search of possible changes. The changes are made so that the performance of the agent is increased by some metric. Metrics to measure progress can be:

  • better time/space tradeoff of the algorithm(s)
  • better compression of the representation of algorithms or data
  • higher score for task(s)
  • higher rationality of a shaped agent [omohundro1]

Justification of common critique of all existing RSI attempts

All presented RSI systems lack an self model which is required for high level RSI modifications [Eliezer1]:

EURISKO lacked what I called ‘insight’ — that is, the type of abstract knowledge that lets humans fly through the search space.

This is necessary for “strong”-RSI systems.

Timeline of attempts to archive RSI

This section contains a short description of the program or algorithm. The Author states his view on the limitations of the artifacts.

1976 - EURISKO

Eurisko [eu1][eu2][eu3][eu4][euRLL5][euRLL6] from Douglas Lenat was the first attempt documented in the literature of archiving RSI. The program itself is described by a centralized data-structure. Fixed slots are used in the data-structure to allow the agent to modify itself. All elements in the data-structure encode executable code.

Often times RSI related papers refer back to EURISKO, for example [schmidhuber1 page 4] and [haase1].

Disadvantages with reasons

  • (lack of self-reflection) The agent has no way to learn which changes are beneficial or don’t have any effect.
  • (lack of self-reflection and self-modification) Douglas Lenat reported that the agent got stuck in local optima. Manual introspection and modification were necessary to overcome this.
  • There is no way for the agent to learn anything from any environment. The agent is just using trial and error (which is a form of induction).

1987 - CYRANO

CYRANO [haase1][haase2] from Ken Haase is based on the principles used in EURISKO. CYRANO’s basic mechanism to store and update heuristics is different from EURISKO. TYPICAL [haase3] is used for both storage and inference.

1987 - Evolutionary principles in self-referential learning

Schmidhuber’s thesis is about self-referential systems.

This system from Jürgen Schmidhuber[schmidthuber1] uses “prototypical self-referential learning mechanisms” (PSAMLs) as a representation scheme for programs and data. It was the first credit-conserving reinforcement learning economy[schmidhuber5] according to Schmidhuber [schmidhuber4].

Modifications are done randomly and they are guided by a global utility function.

Disadvantages with reasons

  • Increase of complexity of the structure/algorithm is not optimal. This is because the search doesn’t favor solutions with low complexity.
  • The algorithm doesn’t learn. The algorithm has no way to improve its performance or required time to find the same or a similar solution again.
  • Modifications are just guided top-down but not bottom-up (like it is the case in replicode).
  • (lack of self-reflection) The agent has no way to learn which changes are beneficial or don’t have any effect or yet worse which are destructive to its performance.
  • Modifications are not tracked, the modifications are not stored in a data-structure to roll back. See success-story-algorithm which implements this.
  • (self) The Agent is lacking a self-model.

1993 - A self-referential weight matrix

Schmidhuber showed in [schmidhuber2] that it is possible that learning algorithms can “speak” about, introspect and modify them self. He implemented?/described a neural network which can improve its own (gradient descent) based learning algorithm.

1994 - Success-story algorithm

Schmidhuber’s success-story-algorithm[schmidhuber3] modifies a policy(POL) between checkpoints in time. The agent has a fixed goal and a reward function which takes the current state and the goal state as function parameters.

The agent is free to do changes to the policy at any point in time. The policy changes which satisfy the success-story criterion(SSC)(see paper for details) are kept.

Disadvantages with reasons

  • search is not optimal. This is because the search doesn’t favor solutions with low complexity.
  • Modifications are unguided and the algorithm doesn’t learn. The algorithm has no way to improve its performance or required time to find the same solution again.
  • Either the actions must not have catastrophic consequences or the agent must be able to simulate actions in a simulated environment with a world-model. This is the case, because the agent could do actions with catastrophic consequences in the process of “playful” learning (trying out modification to the policy).

2007 - Ikon Flux 2.0

Ikon Flux 2.0[IkonFlux2] modifies itself by rating changes (which are generated by a program represented in a uniform representation) with a metric. Only the changes with a positive change in the metric are done. The principles of forward and backward inference are implemented. This was later used in Replicode. Forward chaining[IkonFlux2 page 8] is equivalent to program rewrites. Backward chaining[IkonFlux2 page 8] is similar to planning done by the programs. Note that Ikon Flux is not a planner, the programs are responsible for it.

Disadvantages with reasons

  • (self) has no explicit self-model
  • (attention) lack of an attention model. This is a problem because the system can’t bias itself to favor good changes over bad ones.

2012 - Powerplay

Powerplay [powerplay1][powerplay2] from Jürgen Schmidhuber is an almost optimal greedy approximation of his theory of artificial creativity[schmidhubercreativity1]. The algorithm is split into a solver and a problem generator. The problem generator generates novel problems which the solver has to try to solve. Novel problems are problems which are unsolvable by the current solver. The created problems are just a bit more complicated than the most complicated solvable problem.

The algorithm can solve externally posed tasks.

Disadvantages with reasons

  • The search is greedy. This is a disadvantage because the algorithm can’t narrow down the search in a top-down fashion.
  • The problem generator is the most complex piece of the algorithm and it is hard and cumbersome to engineer new problem generators. The problem generators should cover the abilities required to solve specific externally posed tasks.
  • (self) It is lacking an self-model.

2013 - Replicode

Replicode [replicode1][aera1] is a (proto) AGI architecture and framework. It is based on IkonFlux 2.0. The basic motivation behind it is to let the AI search for new algorithms all the time. The search is guided with a utility based metric.

Everything is described using one data-structure which can be modified anywhere by any part of the program.

Disadvantages with reasons

  • (self) It is lacking a self-model.

References

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade