Alchemy…

Abstraction in Higher Dimensions

9 min readAug 10, 2020

“In deep learning, there’s no data like more data. The more examples of a given phenomenon a network is exposed to, the more accurately it can pick out patterns and identify things in the real world.”
— Kai- Fu Lee

“Only by discovering alchemy have I clearly understood that the unconscious is a process and that ego’s rapport with the unconscious and its contents initiate an evolution, more precisely, a real metamorphosis of the psyche”…
— Carl Jung

alchemy

/ˈalkɪmi/

noun

the medieval forerunner of chemistry, concerned with the transmutation of matter, in particular with attempts to convert base metals into gold or finding a universal elixir [1]

verb

the process of taking something ordinary and turning it into something extraordinary sometimes in a way that cannot be explained [2]

At the thirty-first Neural Information Processing Systems Conference in 2017 (NIPS2017) [3]held at Long Beach, California, Ali Rahimi (previous Google & now Amazon Ai Researcher) and Benjamin Recht ( Associate Professor Electrical Engineering and Computer Science, University of California, Berkley) won the Test of Time Award for their Research Paper — “Random Features for Large-scale Kernel Machines[4]”.

Ali gave the acceptance presentation Speech titled — “Back when we were kids[5]” for the Award.

The Speech starts with him rewinding the clock to NIPS2006 and recounting a time when there was building momentum around Deep Learning’s future potential.

Both had been working on Randomised Algorithms[6], and email exchange, once they got home from the Conference, had resulted in a breakthrough.

An idea that transitioned from a Linear to a Randomised form of computation in Kernel Machines[7]. [Kernel Machines are anchored in arriving at a class of algorithms (heuristics) for pattern analysis whose general task is to find and study general types of relationships in datasets].

They had set out to provide a baseline benchmark for their Deep Learning ideas but could not find any code of other Nonlinear Methods to compare with their Randomised approach.

So they compared their framework and algorithms with proven reproducible baseline techniques at the time, which included Boosting[8] and Non-Accelerated Kernel Machine[9] methods.

Their new methods could take as little as four lines of MatLab[10] code.

Ali then drops a bombshell as he recounts in his Speech this period and his recollection of events — “Now there is something a little shady about the story”…

He outlines how it was generally assumed that in order to approximate a Kernel accurately with these Random features, you would need tens of thousands of Random parts, and yet in their experiments, they were able to achieve good results by simply using just a few hundred features.

In some of their experiments, they were able to achieve better results — lower test errors — with their random approximate method than the original Kernel Machines they were trying to replicate.

Puzzling phenomena, whose Random Features Algorithms achieved better results than the benchmarks they were trying to approximate.

As recounted in the Speech, they decided to send out their Paper with this “dodgyness” in it and brave the rigour of a peer review.

Had they just exposed the Limits of Logic[11], deductive Cartesian reasoning, the Algorithmic Mind[12] in a world of Complexity?
Had they uncovered the Power of the Patterns[13]?
The importance of the Geometric Mind and High Dimensional Structures in prediction.
A form of computational sensemaking[14].

They set out to provide a baseline for Deep Learning, and their Random Features Theorems now provided a new nonlinear methodology that was achieving good results — the weighting of multiple variables in a neural network in order to arrive at a low loss on the training data sets.

Gradient Descent[15].

Transformers learn in-context by gradient descent

At present, the mechanisms of in-context learning in Transformers are not well understood and remain mostly an…

arxiv.org

By the time of Ali’s Speech a decade later, the field had radically changed with new Theorems and approaches such as Random Walk[16] and Nearest Neighbour[17], creating a toolbox of reproducible heuristics — solid baselines — and the application of Machine & Deep Learning to an ever-expanding set of problems.

An explosion in potential applications of the technology — the unleashing of innovation and The Adjacent Possible[18] — from voice recognition, robots, document translation, image recognition, delivering targeted advertising and even the creation of a new type of Organisations anchored in Ai, Cognition & Intelligence.

But despite this emerging world, Ali expresses deep unease in the second half of his Speech.

Rather than see Machine Learning as the “new electricity”, he asks a different question[5]”.

Was Machine Learning the new Alchemy?

A reversion to the Dark Ages — a Medieval period — a pre Scientific World — a time before Chemistry and the Age of Reason[19] — where humans attempted to convert base metals to gold and find universal elixirs.

A world anchored in mysticism and magic.
A world that once again highlighted the Human Condition and the limits of what we know.
The bounds of reductionism, cause-effect, symbolic & formal logic and deductive reasoning in a world of Complexity.

Ali, in closing his Speech, highlighted this problem — our innate desire for rigour — and the need for knowing given Ai was now being applied to critical systems such as healthcare.

Computers, by learning through trial and error had become a form of Alchemy.

A black box.

Could we step beyond Alchemy and build these systems on verifiable rigorous knowledge, certainty and explanation[20]?

Explaining Machine and Deep Learning

There remains an ongoing debate across the Ai community about what it would take to achieve robust Ai?

A question that goes to the core of what is Intelligence[21].

Some believe that Intelligence will require Abstract Symbolic Algorithmic Representation — Models of the World — to progress.

There are others that believe that these Perception Machines will go a long way through the power of computation, customised GPU Chips, memory, examples, data and learning to generate a range of algorithms that will be central to prediction and unbundling Complexity.

Only recently, OpenAi launched a private beta version of a new powerful neural network (~175 billion parameters) language processing model — GPT-3 — [22] — that has demonstrated human-like text on demand and provides a natural English language interface via an API.

It can generate songs, news stories, computer code, support web design, form the basis of chatbots etc by synthesising vast pools of language sourced from the internet.

But are these machines intelligent [23]?

Recently the magazine Science published a story titled — “”Explaining” machine learning reveals policy challenges” [24] which illustrated that we are not much closer to understanding Machine & Deep Learning than when Ali Rahimi gave his 2017 speech.

The article identifies the requirement for political accountability & legal compliance for decisions of substance.

But can these machines exhibit a new form of Accountability and Agency?
Can everything in our Complex world be compressed to a heuristic, an algorithm or a deductive form of reasoning anchored in Formal & Symbolic Abstract Logic?
Are all our decisions anchored in a single objective function such as maximising utility[25]?
Can our Morals, Values and Meaning be codified Algorithmically or does our interdependencies, interrelationships and context matter? — The difference between the What? and the Why? — Explanation & Meaning.
Can Trust — a confident relationship into the unknown (Uncertainty) — be codified, or is it an innate part of the Human Condition — requiring Reliability, Capability, Benevolence and Integrity — a Human Relationship?

The Power of Patterns

Despite these fundamental questions, the power of these computational machines is becoming more apparent every day.

They provide new forms of perception, prediction and computational sensemaking.

They unbundle Complexity and mitigate Uncertainty.

Recently, Santa Fe Institute President and Professor David Krakauer wrote an essential and insightful article titled — At the limits of thought[26].

It asks a fundamental question:

Will brains or algorithms rule the kingdom of Science?

The article highlights the power of these machines in providing fresh perspectives on Reality.

Through combining vast amounts of data and examples, new algorithms and heuristics can emerge.

New forms of knowledge through shaping Algorithms that process information with causal powers.

A quote from the article:

“But in an age of ‘big data’, the link between understanding and prediction no longer holds true. Modern Science has made incredible progress in explaining the low-hanging fruit of atoms, light and forces. We are now trying to understand the more complex world — from cells to tissues, brains to cognitive biases, markets to climates. Novel algorithms allow us to forecast some features of the behaviour of these adaptive systems that learn and evolve, while instruments gather unprecedented amounts of information about them. And while these statistical models and predictions often get things right, it’s nearly impossible for us to reconstruct how they did it.”…

By doing they have the potential to challenge our “truths” and “beliefs”.

Our existing heuristics, theories and models of the world.

A collision of the Human Mind and Machines.

Central to 20th Century Karl Poppers Philosophy of Science[27] is the problem of demarcation, the challenge of distinguishing scientific (or empirical) theories from non-scientific theories.

Through the idea of a falsification test — an axiom that recognises that Science is an eternal search for truth — we can distinguish Science from Non-Science Theories.

These computational systems now provide us with an alternative prism from which to view Reality and apply the Falsification principle against existing Science.

Deep Learning, Machine Learning & Neural Networks has the potential to upend our current Scientific understandings of nonlinear complex dynamic systems such as Economics, Psychology, Education, Society, Innovation, Biology etc.

Collective Intelligence[28] by combining Humans with Machines and unleashing the Power of the Patterns.

An acceleration in our search for Truth in a World of Complexity

A leap forward beyond ideology and dogma.

“Science doesn’t purvey absolute truth. Science is a mechanism, a way of trying to improve your knowledge of nature. It’s a system for testing your thoughts against the universe and seeing whether they match. This works not just for the ordinary aspects of Science, but for all of life”…
— Isaac Asimov

The Power of Patterns…

The Geometric Mind and Patterns of Thought

richardschutte.medium.com

Computational Sensemaking…

The Bicycle of the Mind

richardschutte.medium.com

Footnotes:

[1] Alchemy — https://www.lexico.com/definition/alchemy

[2] Alchemy — https://www.yourdictionary.com/alchemy

[3]Neural Information Processing Systems Conference — https://nips.cc

[4]Random Features for Large-Scale Kernel Machines — https://people.eecs.berkeley.edu/~brecht/papers/07.rah.rec.nips.pdf

[5] Ali Rahimi’s talk at NIPS(NIPS 2017 Test-of-time award presentation) — https://youtu.be/Qi1Yry33TQE

[6] Randomised Algorithms — https://www.geeksforgeeks.org/randomized-algorithms/

[7] Kernel Machines — Kernel_method

[8] Boosting as a kernel-based method — s10994–019–05797-z

[9]Non-Accelerated Kernel Machine — 0701907.pdf

[10]MatLab — https://www.mathworks.com/products/matlab.html

[11] Limits of Logic — https://medium.com/@rlschutte/the-limits-of-logic-e6c27daf7687

[12] In search of Ground Truths… — https://medium.com/@rlschutte/in-search-of-ground-truths-3817ce821572

[13] The Power of the Patterns — https://medium.com/@rlschutte/the-power-of-patterns-e1dc4c2352aa

[14]Sensemaking, the core skill for the 21st Century… — sensemaking-the-core-skill-for-the-21st-century-ebc8c679cfe8

[15] Gradient Descent — https://en.wikipedia.org/wiki/Gradient_descent

[16] Random Walk — https://www.techopedia.com/how-can-a-random-walk-be-helpful-in-machine-learning-algorithms/7/33166

[17] Nearest Neighbour — https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm

[18] The Adjacent Possible — https://youtu.be/GJwoHyJDVLk

[19] The Age of Reason — https://www.bl.uk/restoration-18th-century-literature/articles/the-enlightenment

[20] Philosophy of Biology: Philosophical bias is the one bias that science cannot avoid — scientists also make assumptions of a non-empirical nature about topics such as causality, determinism and reductionism — 44929

[21] Foundations of Intelligence in Natural and Artificial Systems: A Workshop Report — https://arxiv.org/abs/2105.02198

[22] GPT-3 — GPT-3 Powers the Next Generation of Apps — https://openai.com/blog/gpt-3-apps/

[23] The implausibility of intelligence explosion — https://medium.com/@francois.chollet/the-impossibility-of-intelligence-explosion-5be4a9eda6ec

[24] “Explaining” machine learning reveals policy challenges — https://science.sciencemag.org/content/368/6498/1433

[25]Fitness-maximizers employ pessimistic probability weighting for decisions under risk — https://www.cambridge.org/core/journals/evolutionary-human-sciences/article/fitnessmaximizers-employ-pessimistic-probability-weighting-for-decisions-under-risk/FCF743180A566332C8AF9F7E7406AB43 — and — https://www.cambridge.org/core/services/aop-cambridge-core/content/view/FCF743180A566332C8AF9F7E7406AB43/S2513843X20000286a.pdf/fitnessmaximizers_employ_pessimistic_probability_weighting_for_decisions_under_risk.pdf

[26] At the limits of thought — Science today stands at a crossroads: will its progress be driven by human minds or by the machines that we’ve created? — https://aeon.co/essays/will-brains-or-algorithms-rule-the-kingdom-of-science

[27] Karl Popper — Philosophy of Science — https://iep.utm.edu/pop-sci/

[28] The emergence of Collective Intelligence — https://richardschutte.medium.com/the-emergence-of-collective-intelligence-1bd3f2e7a7c4