This is an excerpt from my newsletter ← click to subscribe
Giong sgihltly off pstie wtih tihs — you’ve probably seen this experiment before. Where the letters at the end of words remain the same and middle is scrambled — this experiment has been around since at least 1976 and proves that psycholinguistics are easily interpreted by humans for a number of reasons. Humans have common sense and intuition, the latter being a strange concept based on environmental and implicit learning.
Unfortunately, much of technologically derived communication is predominantly text based — whether its code in a compiler, command line prompts, a scraper or the acquisition of encyclopedic knowledge from the web. Human communication on the other hand, or anthroposemiotics for the budding etymologists, is more multivariate — not only are we able to communicate by text but also by voice, visual and nonverbal cues. This not only allows us to communicate with ourselves but with multiple people and groups. This rich communication and coordination is the primary reason humans are top of the food chain — we can coordinate in groups, share and triangulate knowledge and strategy in a way that other primates cannot. Enough about monkeys.
Unfortunately, for our micro-chipped friends, the computer, this rich tapestry of communication and understanding is currently unavailable to them. Whilst, in the last few years, we’ve made incredible leaps in terms of machine learning within bounded problems (finite tasks, finite results) — we’ve yet to truly see machine learning applied to truly unbounded problems (infinite parameters, infinite potential results). One of the key difficulties is that most machine learning systems are trained within narrow problem spaces. They’re trained within the narrow spaces because they lack what humans might define as simple cognition, the ability to act intuitively. Machine learning today is very good at understanding defined taxonomies, though struggles with input which has yet to be defined, i.e:
“what colour is the sky not?”
Whilst, you and I would understand this to be every colour other than blue, or grey if you live in the UK, a computer might struggle to answer this as no one has strictly defined which colours the sky is not (maybe unsurprisingly).
A lot of this common sense is derived from broad understanding of subjects but also humans ability to understand implicit knowledge, this is knowledge gained incidentally and without awareness. This is a further depth of understanding which isn’t merely superficial and explicit — Kahneman calls this System 1 thinking.
Currently, machines are only partially extracting knowledge from text — in other words there is a superficiality and explicitness to their understanding of text based communication, computers may only be aware of the sky being blue.
Yejin Choi, associate professor at Washington University, is building a wide corpus of “common sense” knowledge for machines. Her team is looking to build a model which understands implicit knowledge from text, and plugs the gap between representation and knowledge, the difference between explicit and implicit knowledge. Yejin’s first paper, Verb Physics, is an attempt at inferring physical knowledge of actions and objects, on five different dimensions i.e: “Tyler entered his house” implies that his house is bigger than Tyler.
The ultimate goal is to have a broad benchmark dataset which multiple learning systems can pull from in order to plug implicit or System 1 machine knowledge gaps. Attempts such as this are looking to plug the quadrant of simple human problems and hard computer problems (“simple/hard”) — which as of today is partly limiting the breadth of AI applications and problems. Whilst, the threat of artificial general intelligence might seem imminent, certainly amongst pessimistic techno-fantasists. The reality is that these simple/hard problems still need to be codified — with most experts continually pushing out the timeframe to AGI until such problems are solved.