Great line of thought!
nidhi bharani

The way that our frontal lobe works, each stack handles a set of tasks. A stack would only be ‘called upon’ to handle certain situations. Our brains have a few stacks that specializes in jumping rope, and another cluster of stacks for driving a car. Language requires a lot of stacks, and each stack handles a particular set of circumstances.

For example: in French, they place adjectives after nouns, the opposite of English. A translator only needs to ‘call upon’ that grammar rule when they encounter a need for it. There is no ‘word location’ issue, because it does not behave like a RNN. (The stack that handles adjectives would be ‘called upon’ when the translator encounters an adjective, but not when they encounter a preposition. It’s situational.)

Mixtures of Experts already work that way — each expert is ‘called upon’ for a certain situation, not a word location. The problem with the existing Mixtures of Experts is that they do not have a ‘stack of pancakes’ architecture; they act instead like a ‘pile of jam’. For a Mixture of Experts to behave like Jeff Hawkin’s model, there would need to be six different neural networks, each using a Mixture of Experts.

The lowest of these six networks would not be very good at its task. Yet, each network ‘pancake’ would correct errors from the pancake below. Even if each ‘pancake’ was only correct 50% of the time, the combined six stacks would be more accurate (>98% accuracy) than existing translation software.

So, each stack is really a group of neural networks, and each of those networks has its own Mixture of Experts. Word locations are a very specific, small feature. Each stack would handle millions of those features. Our brain has millions of these stacks because we can learn patterns in millions of different things.

Our brains make better use of our stacks, too. Our language stacks have connections to our 3D modeling stacks, allowing us to imagine the scene described in a sentence. We can compare a translated sentence to its meaning, because our language stack doesn’t operate in isolation. Currently, a translator neural network doesn’t know if their sentence even makes sense! We need to connect language stacks to 3D modeling stacks, like our brains do. :]

Show your support

Clapping shows how much you appreciated Anthony Repetto’s story.