If Humans Spoke in Vectors…

Would we be as successful as we are now? I’d say no.

What are Semantic Vectors

Word Relations and Différance

These relations are what many of us will resort to when asked about the structure of language. A dictionary, after all, just refers us to other words when asked for any definition. The implicit conclusion here that without grounding in the real world, all words are purely relational, bears striking resemblance to Derrida’s Différance. The core of Différance, which became fundamental to Deconstruction and the Postmodernism that now dominates humanities, is the idea that words gain meaning only through their difference with (and deference to) others. Only in the real world, with speech, outside of the realm of paper, would Derrida acknowledge that an uttered symbol can present real meaning.

Grounding Meaning

But we might be confusing two things here. Though these language models learn the meanings of words through their relationships and co-occurence with one another, we can think about definitions on their own as well. A classic example from Indian philosophy is that of the pot. What does it mean to be a pot? As humans, we’re great at generating Platonic Forms from our real world experiences. Once we’ve seen a few pots, our brains are able to construct a pretty accurate generalized Form for pots, and when we see one again, we’re easily able to recognize that it’s another instance of this Form. This representation our brain learns is so much more information dense than what can be gleaned from language. It’s imbued with an innate understanding of visual and physical attributes.

Composing and Comprehending Meaning

While grounding is vital for learning accurate word representations, something even more elementary to me is the compositionality of language. Compositionality is the idea that the meaning of a sentence is a unique, synergistic result of combining the constituent words within it. It’s closely related to Noam Chomsky’s Recursion, which he asserts is the fundamental element underlying all human language. Recursion is the ability for us to infinitely nest expressions in language, much like this very sentence, where I can just continue chaining on clauses, again and again, until I desire to stop, at which point I may place a period in writing, or a pause in speech, and then continue on to present yet another idea.

Logic or Statistics

Transformers, though, are very clearly not just syntactic models. They have a strongly statistical element that allows them to say that “chased” 0.7 attends to “cat” and 0.9 attends to “mouse”. We often think that this ability to compute statistically on floating points gives the model greater power, but might this actually be a hindrance?

The Power of Symbolic Structures

We’re often told that it’s better to think statistically, which could totally be true. But some recent findings in Graph Neural Nets suggest that symbolic models, which discard much of the statistical nuance in exchange for simple algebra, are far superior at generalization while also improving explainability by eliminating the black box neural net. In the paper, they use symbolic regression to generate simple symbolic formulas (algebra trees) from their neural network (linear algebra vectors). When applied to astrophysics, they generated accurate physical formulas that were capable of explaining interstellar phenomena.

Undergrad ML researcher writing about Linguistics, Neuroscience, & Philosophy

Undergrad ML researcher writing about Linguistics, Neuroscience, & Philosophy