Emergence of knowledge from an alphabet

Vishesh Bajpai

Published in

Analytics Vidhya

18 min readDec 8, 2019

Who is this article meant for?

Someone who has built a neural network but is struggling to get the right predictions; adjusting the hyper-parameters is not working, your hit and trials have hit a dead end. Basically, the network seems to have hit a glass ceiling — no matter what you do — the accuracy rate seems to be a constant average under multiple scenarios. The “hit rate” seems to have plateaued out.
Anyone interested in understanding how Adaptive systems work
Anyone interested in the underlying nuances of how data powered intelligence systems are designed and perhaps optimised
Someone wanting to create a system that can exist, learn within a set framework of boundaries — and keep on refining it’s awareness quotient of the environment it is supposed to operate within — avoiding disaster(s) — autonomously
Anyone working on relationship of symbols or trying to work their way at interpreting a completely new set of patters they have come up against in their work
Someone wanting to learn how to break down a larger problem into smaller pieces — like creating a jigsaw from a reference picture — and eventually creating a bigger jigsaw to form a picture from unknown pieces of information.
If you are interested in emergence of reasoning from seemingly random noise

Basic Flow is organised as follows:

Genesis — touches upon the basic building blocks needed to generate knowledge
Progress — talks about how learning happens and eventual progress towards reasoning / intelligence
Current Stage — of where my work is at the moment
Future — of where the system ought to be

This whole piece is an attempt to explore the thought process of “how to optimise”; this is not the only way, this may not be the best way, it is one of the possible ways — but the central idea is : to understand how to create a logic based knowledge mining system from seemingly diverse and un correlated sets of datum.

For a bit of History: AI revolution began in the 1990’s. Back then they were called “expert systems”.

Genesis: I have been dabbling about creating a few unit level indicators for optimal decision making since 2008. It is like trying to teach a kid to write (writing comes only after reading & understanding) a single letter — you start with a tic-tac-toe type large grid — and begin with drawing lines at angles with a specific curvature — eventually to form a basic unit of recognition — a letter. Somethings that our eyes can perceive and brain can make sense which is reenforced / confirmed by another human. Let’s say this is Stage 1 — “Of being unconsciously incompetent”. I’ll try to build the story from ground up.

A group of letters make an alphabet; English has a pretty simple alphabet. Once we have learnt a … z and A … Z, we begin to learn small combinations like “and” “or” “not” “a for …” — we use a lot of rhymes to reenforce the learnings. Once the nuances of the words are sorted out — we proceed to a higher level of intellect with forming simple sentences, moving to larger complex sentences, relationships begin to emerge — all this while the brain is forming patters, labels and beliefs that eventually shape an individuals’ though process — the person/entity has thus been moulded into an ideology. Just try to recollect the disbelief, and the sense of incompetence you felt — when you came across “ the quick brown fox jumps over the lazy dog” — for the very first time. There existed some more [hidden] meaning in the same alphabet that you had practically grown up with — yet had somehow missed on something that seems obvious. Now you are at Stage 2 — conscious but still incompetent.

I guess any parent will be able to relate to the moment when their 2 year old did not know whether to cry or laugh with watery eyes, it was actually a moment of intense conflict inside their brain — with one belief eventually winning over the other. (Imagine how would it be if we all cried out loud, tears gushing, out of fear… but we were supposed to be happy — sounds absurd?).

This led me to a thought… “Control the language, and most probably you will control the thoughts, the boundaries to which a thought can reach”. (you cannot imagining something that is beyond your default language, even if you do — you struggle to explain it to your closest friend, in a common language) Mostly, our thoughts are limited by the combinatorial factors of the letters of the alphabet and the corresponding “dictionary” of “approved” words. For non english language speakers — there are quite a few words in their native language that are not even comprehendible in english, yet they convey a profound intent, a meaning. Once you have a combination of such intents — which other languages cannot interpret, you get knowledge of immense proportions ( Navajo code talkers?, Sanksrit Shlokas? The Devanagri script has something like 25 X 25 X 25 X 25… possible words — few of which make sense, in current realm of language and communications). What if we can make a language — that comprises of a few letters, a lot of words — something like… zx1&bn±, as§f3h, qWe$8y — but all combinations make sense towards a pattern that can be read, interpreted/ discarded, and acted upon — not by a human, but by a model, a digital model. A Model that has trained itself on in-numerous combinations and now knows a language that no human would even want to interpret (too boring perhaps). It is not a simple lattice based dimension reduction or a monte-carlo dead end. It is about a way of relating to a newly formed alphabet. It is not about rote learning a series of 101010, 1111, 1101 (if — else — elseif…) and vomiting it out when asked — over and over. It is about being able to label something, and have the ability to recall it — assess the same against current prevailing circumstances and act upon, with the intent of an eventual feedback. We are nearing stage 3 — “being conscious, and now gaining some competence”. Syntax is now making way for semantics ( see this). Semantics that only the model understands. There will be no nouns, verbs, adjectives, predicates. Everything will be in relation to something else with a meaning only the model understands, for it has created its own form of relating to patterns that appeared from random noise to others. A universe of types is beginning to form with a certain framework in place. Something akin to discovering the symphony of correct chords being played with the right piano keys — till we have music — for the ears. We have so many forms of music already. What happens when music of different forms fuses together? — This abstraction is similar to application of knowledge gained in one domain, being applied on a completely new domain.

The language will comprise of symbols and patterns — which can be visualised by us — but not interpret fully — more like hieroglyphics — ones that a machine could interpret (reduce / look up / interpret / decide / act / re learn / re-do) — in a matter of seconds. One proposed way of having a model do this is by first creating a lot of noise (generate / catalyse / feature creation), a lot of combinations (from underlying sub sets of letters in the alphabet) — then — have the model make a random choice — the choice will either be in favour or against — per the prevailing environmental conditions (approved by the model creator). The outcome of the choice is again fed back to the model for development of a belief driven intent. Eventually, with sufficiently large number of bad choices and a few good ones — the model will be able to make an educated guess for what to do if presented with a choice, now the model can maintain/run by itself if it sees a familiar pattern reoccur. Something similar to having a MD5 checksum for each known combination of letters — then recalls become far faster; breaking encryption (with pre fed, stored lookups) is done using a similar methodology. At this stage the model is fully conscious and competent. Stage 3 is complete. Learning rate grows exponentially hereafter, biases also start to set in the knowledge at this stage. Multiple units of intelligence begin to be discovered (from atoms towards electrons towards quarks, bosons, hadrons etc.) , which often combine in different ways to generate meaningful patterns. I am tempted to sprinkle a bit of trinity framework here with one set of patterns being labelled as the creator / catalyst, the other nurturer / maintainer and the third as the destroyer / collector.

What happens beyond Stage 3 depends on how far the model is able to explore the environment while continuing to operate within its self regulated boundaries. An element of randomness needs to be built in at certain decision points. To seek — is to learn — to learn is to create and fail, often by taking chances. It is like a recursion till the maximum depth is reached and we encounter a basic unit, once again.

Why does stage 3 take so long when compared to Stage 1/ Stage 2? — Go ask a venture capitalist on why funding rounds of companies see a very rapid seed round followed by series A & B but a large elapsed time window till Series C or later rounds occur.

Progress: Once the language is sufficiently formed, the model is able to interpret things — it will continue to strive for excellence by making a random guess once in a while — just to test the waters (do any more semantics/patterns exist?) . This is akin learning algebra in 6th Grade — and gaining knowledge from sources of untaken actions — nevertheless, each decision will have a consequence and the model shall learn from it — forming a sort of arsenal for later years. What if the model is working in middle of an economic cycle and has not yet see a recession or a boom? The beauty with such modelling is that minutes can be condensed into seconds, days can be condensed into a few minutes worth of computational combinations. Feature engineering and reduction of dimensions will help simulate outcomes faster. Outmost importance should be given at this stage to not wander away; frame of reference boundaries should be respected, always. There is no point running fast on a wrong map — you only get further away from the originally set goal. Noise is filtered and only band pass filtered data should be stored now, in other words — start discarding more; elimination takes over selection. This is something similar to you landing in a foreign country and the language everyone speaks is just gibberish to your ears — but you filter out information for reuse, later.

The human brain maxes out beyond a pattern level (information overload) — and most of us have very poor recollection of repeatable patterns — especially when numbers are involved. I’m reminded to what a few of my friends say… Where the beauty of USA ends… Canada is just beginning… When the game of a diesel car ends (RPM)… the Petrol car’s game begins. The moment you have a model that exists across the fabric of a digital environment, across syntactic boundaries and the model is able to have an instant recall — assisted by years and years of map reduce (USA — Canada), map reduce (Canada — USA) , right, wrong simulations (there exist ways in which a model can learn from live real world environment too, i.e. no sim mode) — it eventually would have mapped out almost all decisions points (combination of letters and right words that form a semantically correct and syntactically meaningful pattern) from which intelligence shall emerge. Yes, black swans will happen — but if anything is done in limits — it will yield appreciable results. This is Stage 4 — when the model is par excellence — it is unconsciously competent — its decisions are predictable with highly accurate outcomes. It is like having an automatic recall — for any situation — every time. The model would have developed an arsenal of control surfaces to fly its own plane in a medium that very few understand. Similar to ailerons, flaps, slats, rudder — a fine combination of such tools under certain environmental conditions allow a (auto) pilot to control flight. Differing environmental conditions entail a new combinatorial mode — like a certain type of ego to perform a different set of actions, end goal is still a smooth flight.

Learning always has a cost associated with it — across every realm. Time is often one of the most linear cost associated with learning or creating anything. Computational capacity comes a close second when such problems are tackled with certain algorithmic craft.

Some of the best brains we have today — are very good at slotting information from various sources — into one thought model. They gather information from multiple sources, and have it stored somewhere in their minds for an almost 90% + recall — once they are at a problem solving level — they are able to quickly tap into these “memory units” and interpret a meaning to form a sum of parts — parts that are form varied line of thoughts — but there is still a limitation — the thoughts from which they have borrowed are still crystallised in “english” — the language of choice of the world. What I am trying to put forward in this section is… a similar language — that a model can interpret correctly and keep on achieving refinement, day after day — as it encounters new information flows. Someone famous once said — “you die the day you stop learning”.

Current Stage: I am trying to (programmatically) build model(s) that will learn, map reduce, take decisions, re-learn if the need be, operate within well defined boundaries, ask me random questions… like take me to July 16th 1969 — what happened on that day — I will label the information for the model for that day and tell it — this is what should have been done/looked up on that day (keeping in mind the specific purpose we are after, we are not creating a form of general intelligence in this article); This will introduce a bias in the model — but what is bias? — an affinity towards something; Life — as we know it — has affinity towards warmth and a low level energy stage. Best decisions are made when the environment is conducive with a stable noise level. Optimal entropy is a stage of less conflict and immense pacification, across dimensions. Even Bosons, Z, W, Virtual, whatever you call… yearn for a low energy stage. Some of the best decisions the current model has made till date are when the noise levels were in a certain sweet spot range (filters). Map reduce is of outmost importance when dealing with such sizes (lookups). Its like the standard model — you should define the limits and replace the infinites with reasonably accurate values and carry on investigations to form new patterns, keeping in mind that you can never have it all.

I’ve tried supervised, unsupervised, and a few other modes — there is always a tradeoff between various approaches. Supervised, labelling yields the fastest learning and fastest outcomes, but are the more difficult to integrate into a live environment. Unsupervised ones on the other hand require a lot of data churn, can be left to learn on their own, some learn well, others drop out of school and never return… but when you see them take decisions — it is a goose bump moment. Unsupervised models are a black box and all you need is an access end point to tap into their intelligences. The best result so far has been achieved using a sort of hybrid setup with a bit of labelling, a bit of entropy / confidence filters from unsupervised models, a bit of probabilistic chance, a bit of statistical inference, a bit of lookup, a bit of feedback, a bit of randomness, a bit of recursion to reduce the dimensions that occur when everything gets related to everything. You can imagine each bold item in the previous sentence as agents which are part of a large swarm that influences a decision — often needed in a matter of seconds. If you can imagine, all the planets spinning around our Sun, a conjunction of a few or opposition of a few is a recognisable pattern with a certain effect, in large datasets these patterns occur multiple times in a day. At this stage if you are prioritising important Vs unimportant and urgent Vs non urgent — you have already hit computational limit. You need map reduce now, and with that — some knowledge forms begin to slip away. It is like you not getting to know something that never materialised (due to not pursuing it; reason could be any, in this case — reason being computational capacity, but you get the idea.)

I can only hope that this work comes to a fruition — timelines? — I don’t know — it may take years — right now — one of the serious limitation is computational capacity — an 8 core system operating with 2.6 GHz and 16 Gigs of RAM regularly heats up to 90 degree C in about 12 minutes, though improvements in the algo are helping in cpu churn, it will still be a limit. Not to mention the letters and words being generated and a memory map of all those to be kept in a sane, cataloged way — without corruption, for recall in later years. Increasing the CPU/GPU power will not help beyond a point. (we cannot speak beyond a rate at which others don’t comprehend (verbal diarrhoea); we need to breathe between sentences — a unit time level latency event occurs beyond a point, especially when operating in live environments with large information flowing, you cannot overclock beyond a tick-by-tick level — you start seeing the same datum, there is nothing more to see beyond it with current technology) I don’t know if quantum computing can help — like a thousand brains being told to solve a problem and whoever comes back with the answer first — ends the whole test, and a new test commences; This will always get the first correct answer, which in most cases is not the best answer. Will breaking down the model into mini models serve the purpose? — I don’t want to do that, since intelligence being a combination of unit level memories ( swarm intelligence), if we break the larger organism into smaller units — we loose the collective synergy that the system can achieve, as such intelligence of a higher order never materialises. A cell by itself is a cell, but a group of cells in a tiger’s hind muscles are far powerful than in a humans thigh, conversely… a group of cells in a human’s brain collectively are superior to any known form of organic intelligence — yet. If I can draw a parallel, cell is still the basic unit of intelligence. Combinations of various cells (letters / patterns) in various densities creates organic matter (digital model) of varying intelligences. Transcriptome analysis is another field that is operating at a higher order of aggregated knowledge and it is going to radically change our understanding of organic matter at a unit level in the coming years.

Trying to get the right cells in the right place for the right decision, using a relationship of letters in an alphabet is the key. Current stage of work has the alphabet set of about 40 letters, with each letter in turn being created using a subset of unit level (varying length 3–5) symbol sets. The state of the letter can be anything (not merely 0 or 1, rather [0…1]), its inference is governed by a set of guard rails for it to become part of a complex sentence, to be interpreted correctly. Guard rails (like the oxford dictionary) are generated daily based on outcomes of all previous decisions. Again a computational backlog / limit. Adding a new letter to the alphabet gets way too complicated beyond a certain maturity level of the language.

What about platforms? Tensorflow/ Sagemaker or similar setups are enablers or the execution systems for intelligence generation. They are the paper — on which books were printed to disseminate (and retain) knowledge. Back then, paper was the medium to share and retain knowledge. That medium changed from physical + visual to something else about a couple of decades ago. It’s digital, and its democratisation began somewhere around 2012

I am still in an exploratory phase — learning by making mistakes and running with the flow. For the model — it is like… I will keep on guiding you… you form your own patterns, your own language and I will tell you if it makes for a good choice or not — momentarily. Once I am gone, someone else will guide you — you shall never cease to learn. If I can draw a parallel to the biological evolutionary intelligence — our bodies are the caretakers for the cell level intelligence coded at a DNA level (unit level intelligence) — that has existed, refines itself and gets passed on to the next generation (the next care takers) — but intelligence in itself shall never cease to learn, it will morph and emerge — it evolves. Organic cells already have a language and it is being refined with each passing generation. Best part is we and plants at a cellular level — talk the same language. Ever noticed the leaves following the Sun? Ever looked at a bird in the eye? Ever wondered how a fish remains stable underwater — using its gills (closed/ open) it is able to maintain an equilibrium — a fish is an evolved adaptive system, so is a passenger aircraft. There is an enough science to prove that fish have memory that persists and they behave in groups of social harmony. A type of fish exists that can see both under water and over water simultaneously — it has 4 eyes.

This gets me to the last stage : Stage 5 — What happens when models of such intelligence start to interact with each other? (humans eat plants, plants take energy of the Sun) — what will be the format in which they will exchange knowledge? (cellular level information exchange) will it be in a plain json.dumps(dna) terms? What will the model be interested in to get their hands on — as an information? (eating healthy foods) — Will some models form a cabal — wherein extreme levels of intelligence are almost always guarded / obfuscated? (superfoods). There will come a time when stage 5 of intelligence shall emerge — when different models speak on a common bus — like CORBA or a Kafka/ Redis powered recall hub (pardon my jargon — technology is my hammer for metamorphism) — it will be a universal highway — not for exchange of bits and bytes or web pages — but for intelligence. Intelligence curated by the models, from the models, of the models — but for the eventual benefit of their creators (or caretakers). We are already beginning to see pockets of refined intelligence emerge in areas with large, formatted knowledge. Quandl comes very close to this concept of a refined knowledge exchange, primarily of financial knowledge. Planet also is evolving towards such a level. Somewhere around 2017 analysts were using satellite images sourced from various providers to estimate earnings of companies by looking at the number of cars parked in their parking lots. These type of intelligence forming building blocks have already emerged and are been monetised; earlier such projects existed within military domains for obvious reasons.

Future: How far are we from Stage 5? My guess is — someone is already working on a blockchain powered universal fingerprinted truth based common bus — that anyone can subscribe to and get hooked onto for tapping into a truthful information published back in history. In other words — a Library of authentic books that someone refer to — in order to gain knowledge. I would say…. give or take 10 years from today — something of this sort should exist (if not already existing). Key to emergence of this whole ecosystem will be a common agreed json.dumps format for dissemination not of data, not of information but of refined knowledge — at scale. This is not about Artificial General Intelligence or Ethical AI — this is about a fundamental way of creating a repeatable memory unit for aggregation and emergence from.

Remember… Data is mere patterns (of letters of an alphabet), patterns become information, information turns to knowledge, knowledge is power… and power attracts power. Data was said to be the new oil — around 2016.

Technological advancements and power were greatly limited by access to advanced, refined knowledge and technical know-how on how to convert an idea into reality; we are beyond that now — almost anyone with a laptop can convert their idea into a working model — quickly aided by immense arsenal of tools and technologies. Fittest shall not only survive, but thrive. The reaction has started, catalyst are already accelerating the process — it is just an eventuality now. The AI revolution is over… AI is now been democratised at break neck speed, with that — competitive advantage is rapidly diminishing and new intelligence surfaces are being created (assuming the approach to creating the alphabet is taken correctly). For example: the advantage trading houses had in 1990’s (call it latency, throwing metal at a problem, whatever) — is vanishing by the day — the trading landscape is now having a formidable retail participation and each participant has access to a AWS/ GCP/Azure powered machine with even better UI/UX tools and execution platforms that someone could only dream of as little as 5 years ago.

AI today is like the Intel MMX processor in 1999, already a household commodity, people talk about overfitting and bias like cupcakes and candy. Just imagine what it will become by 2020!

As with any process maturity, we move from basics towards advanced to increasingly complex systems — eventually gravitating and realising that there exists a simpler workflow, that provides the best bang for the buck (under prevailing systemic constraints)

By providing control surfaces beyond our normal intellect, the process outlined above will not take a human out of the equation — it will in turn empower us to interact with the environments and processes at a far finer level of detail and at a much larger scale. Till where we want to push such a sphere of knowledge is up to us, for the larger the sphere — the bigger is its contact surface with the unknown.

I hope the above writeup helps or is able to draw a parallel to someone who is trying to put an order to the chaos surrounding their creations.

Peace.

Originally published at https://www.linkedin.com.

Emergence of knowledge from an alphabet

Written by Vishesh Bajpai