Why is “Learning” so Misunderstood?

Walid Saba, PhD
ONTOLOGIK
Published in
5 min readSep 1, 2022

I have written a few posts where I make the point that most of the important knowledge that is needed to build intelligent agents is not learned —because it cannot be learned differently, and it cannot be susceptible for incremental, approximate and individual learning from observations. It is that simple. I have written about this topic first in “Learning is Overrated: Machine Learning vs. Knowledge Acquisition” where I discuss the difference between “knowing how” and “knowing that”. Recently, I wrote a post where I explain “Why Commonsense Knowledge is not (and can not be) Learned”. In comments and (mostly private) messages I keep getting remarks like “but why can’t that be learned?” It seems that the ‘folk’ meaning of learning has taken over even the most rational of people that the techncial point I’m trying to get across is still not appreciated.

The Folk Meaning of Learning

It is quite understandable for the common folk to lump all kinds of learning under the English word “learn.ing” but in techncial domains we have to be more accurate in our language. We learn to play guitar, to ride a bike, to swim, to fix a TV set, etc. We also learned that addition is commutative (that is, u + v = v + u), that the intersection of two sets cannot have more elements than the elements in either set, that water boils at 100 degrees celcius, that the circumeference of a circle is 2*PI*r, etc. But the first group of things we learn are quite different from the second group — we learn the first group of things by experience, while the second set of things is learned by “being told” or by “deduction”. Furthermore, there are a number of what I called previsouly universally valid cognitive templates that we don’t even learn at all — these are innate cognitive templates that have been encoded in our DNA and they are simply tenplates that abide by “the laws of nature” (e.g., that a phsyical object cannot be in more than place at the same point in time). But let us stick to “knowing how” vs. “knowing that”. Below I will reuse a diagram I used before.

Let us take an example from each branch. Let’s take “knowing how to play guitar” as an example of “knowing how” and “knowing that the circumference of a circle is 2*PI*r” as an example of “knowing that”. Now note that:

Knowing How to play guitar is individual
My cousin plays guitar and so does Carlos Santana, but — unfortunately for my cousin, he and Carlos know how to play guitar differently!
Knowing That the circumference of a circle is 2*PI*r is universal
My knowledge that the circumference of a circle is 2*PI*r cannot be different from your knowledge of the same fact, either we know this fact or we don’t, but once we know it, we all know the same fact.

Knowing How to play guitar is continuous/fuzzy
My cousin knew how to play guitar gradually (incrementally) and also there’s no ceiling to how much that skill can be developed.
Knowing That the circumference of a circle is 2*PI*r is absolute
My knowledge of the fact that the circumference of a circle is 2*PI*r did not happen gradually —I did not partially know some part of that fact or learned that it is 0.85% true and then I arrived at knowing that it is an absolute truth. I learned that fact once, and that it is a true fact with 100% certainty.

Knowing How to play guitar is inconsequential
The entire universe does not care if I know how to play guitar or not.
Knowing That the circumference of a circle is 2*PI*r is consequential
The universe will not be the same if the circumference of a circle is not 2*PI*r.

Knowing How to play guitar is not forgettable
Once we learn some skill, it cannot be forgotten (unless we have a memory loss — like, as it was rumored, once happened to Jerry Garcia, the guitarist of the Grateful Dead, who temporarily forgot how to play guitars but got the skill back when he recovered and gained access to his memory!)
Knowing That the circumference of a circle is 2*PI*r is forgettable
We do forget some facts that we need to either look-up again or are reminded of somehow.

The differences I went through above should clearly make it now clear that “knowing how” is quite different from “knowing that” — this is actually an old subject that has been studied by logicians and philosophers for a long time (see this for some background reading and follow the references there for more in depth reading.)

Machine Learning and Knowing-How

Machine learning (ML) has swept everyone by a storm to the point where it has become synonymous with artificial intelligence (AI). (actually, I always find it amusing when I see someone declaring his skills by writing “AI/ML”, I guess using the backslash to mean “and/or” as if AI = ML). Regardless, ML has been equated with AI because for everyone it seems so obvious: a machine that can learn will, eventually, be an intelligent machine. Well, not quite, and especially if by learning we mean learning from data (i.e., learning from experience/observation), because many things we learn (we “come to know”) not by experience/observation, but (i) by being told (instruction); (ii) by discovery; (iii) by deduction; or (iv) by analogy. In other words, we do need “knowledge acquisition” but statistical learning and finding patterns in data is a tiny part of our cognitive apparatus. Moreover, and more importantly, most of the things we learn from observation/experience is not knowledge that is consequential to our functioning as intelligent agents. Furthermore, and as I wrote in the previous post, commonsense knowledge (e.g., if x is bigger than y and y is bigger than z, then x is bigger than z) is not (and I repeat again, cannot) even be learned at all. Those universally valid facts (that obey the logic of the universe) cannot be learned individually from experience/observations because we are not allowed to learn them differently.

Final Word

I hope I was able to defuzzify what the word “learning” means. This is important in AI and has very serious consequences — least of which is to be humble in making claims that the dominant AI paradigm we call “deep learning” (DL) will lead to artificial general intelligence (AGI). Quite the opposite, and even if DL becomes very accurate and can get serious problems like adversarial examples, transferability, and other issues resolved, it will end up being just a tiny fraction of what is needed to build intelligent machines since most of what we come to know (or we come equipped with) is knowledge that is not learned from data (observation/experience) through an incremental process of minimizing error. That is just one simple mechanism.

Finally, I hope I have explained the technical reason why quite a bit of knowledge is not (and cannot be) incrementally learned from observations, and I hope I will not anymore get messages like “but why this/that cannot be learned?” — although I suspect I will still get such messages 😊

Keep AI’ing

___
https://medium.com/ontologik

--

--