webpage 2 vec?
I happen to be a little obsessed with the word2vec algorithm. I believe it represents a good model for the brain — mapping concepts as coordinates in high dimensional space.
This makes sense, when you think about the structure of the brain, where a neuron’s function can be partially defined by it’s connection to hundreds of other neurons. And it helps explain the way we can think of an animal a bit like a tiger and a bit like a eagle, even though we haven’t ever actually see one. I believe we don’t store ones and zeros for concepts in our brains, but that we store multi-dimensional maps of concepts.
This also helps explain why you can learn a word from seeing it in context enough — a word’s meaning is at least partially derived by it’s relationship to other words.
This kind of learning isn’t restricted to words in a sentence — in fact it can be used with any kind categorical data that occurs in sequences, for which you want an map that helps explain it’s internal relationships.
A great example of this is website sessions. During a session, a visitor can look at a number of product pages in a sequence. If we call each product page a word, and each session a sentence, we can embed our products in a lovely rich space.
From this space we can create recommendations, or discover hidden product relationships. Here’s an example I did using Google Tensorflow embedding projector, with the dimensionality reduced using T-sne. (Labels have been removed to protect i.p.).
You can clearly see product clusters, which can be useful for recommendations and catalog structuring.