Hyperdimensional computing and its role in AI

Givi Odikadze
DataSeries
Published in
6 min readJun 14, 2020

What is it?

Don’t be intimidated by the name hyperdimensional computing
Even though it sounds like something from Star Trek, this simply means:

Computing using large vectors (say of 10000 values each)

Why would we do that?

As it turns out, such vectors have interesting properties that are appealing to the realm of AI:

  • We can pick any number of random vectors from the hyperspace of 10k dimensions and they will all be approximately orthogonal to each other. This nearly garanties that we can generate a fresh vector at any time that do not resemble any previous ones.
  • We can add two vectors to obtain a vector similar to them.
  • We can multiply two vectors to obtain a vector dissimilar to them.
  • We can represent data holistically (meaning the value is distributed among many data points), making our vector redundent and robust to corruption (we can reconstruct the meaning of a vector as long as we have 60% of its content).

Let’s explore what we can do with this!

Example 1: Guessing the language of arbitrary text

Problem: we have text (of any size and content) and we want to guess if it’s French or English.

Solution: compute vectors for each language and input and compare their angles.

Steps:

  1. compute a single vector for each language: one for French, one for English.
  2. compute a single vector for the input text.
  3. compare the input text vector with both language vectors using the cosine similarity. The closest language vector to our input vector is most likely the input’s language.

Ok… let’s go into each step one by one.

Step 1: compute a 10k vector for a language

Given some sample text for say English, the vector generated with it will “represent” the English language (naturally the larger the text the better it will represent it).

The encoding process works as follows:

  • Generate a random 10k vector for each letter and store it. We can use +1 and -1 as the available values, which I’ll call “bipolar vectors”, so the vector will look something like this: [ +1 -1 +1 +1 -1 -1 +1 … ]
  • Encode trigrams using rotate and multiply operations. This summarizes short sequences of 3 letters into 10k vectors.
    Example: the input “something” will generate one 10k vector for each 3 sequential characters: “som”, “ome”, “met”, “eth”, “thi”, “hin”, “ing”.
    Each trigram (say “som”) will be calculated as som = rr(s)×r(om
    - We use rotate r(x) to encode the position of a letter within the trigram.
    - We use multiply to produce a new unique vector dissimilar to any of the letter vectors.
    Note that we can replace trigrams by “quadgrams” and it will still work. You can experiment to see what works best (even mix sizes), I found that sets of 3 yield good results.
  • Take all of the trigrams and add them together to get one 10k vector that is the representation of the language. This vector is going to be similar to all of the trigrams combined.

Step 2: compute a 10k vector for the input text

The process is exactly the same for encoding a language vector.

Step 3: compare the input and language vectors

Using the cosine similarity we can compare the angles between vectors:

  • cos(A, B) = 1 means both vectors are equal.
  • cos (A, B) = -1 means they are equal, but opposite.
  • cos (A, B) = 0 means they are orthogonal.

The closest cosine to 1 wins!

Unlock more power!

You can also query your language vectors for information. Say you want to know what letter is most frequently seen after a sequence of letters in a particular language? Just ask the vector:

Problem: give me the most commonly seen letters after the sequence “th” in the English language.

  1. Compute the query vector Q = rr(t)×r(h)
    (remember that a trigram is encoded as rr(a)×r(bc)
  2. Compute the answer vector A = Q×en
    (where en is the language vector for english)

Result would be “e”, followed by “space”, “a”, “i”, “r”, “o”, etc…

Find the code example under examples/languageGuessing: https://github.com/gokadin/hyperdimensional-computing

Example 2: Using HD vectors for semantic binding

Problem: given the phrase “I want to run”, encode its semantic structure within a 10k vector.

Solution: pick a random vector for each component (subject, verb, object, I, want, run) and mash them together.

In his book “How to build a brain”, Dr. Chris Eliasmith demonstrates how you can generate such a vector with the following formula:

P = subject I + verb want + object run

where ⊛ represents circular convolution.

Then, if you want to query P for “what is the verb?”, you can unbind like so:

wantPverb’

where verb’ is the approximate inverse (or involution) of verb.

Of course you can take this as far as you like. Nothing keeps you from building complex syntactic structures and complex queries.

Find the code example under examples/semanticBinding: https://github.com/gokadin/hyperdimensional-computing

Looking forward

The properties of hyperdimensional computing make it a good fit for many functions in AI. A big part of intelligence is about detecting, storing, binding and unbinding noisy patterns. HD vectors are well suited to represent noisy, redundent and robust patterns and there are many operations to bind and unbind them.

Associative memory is the first application that comes to mind, however there are many less obvious uses for HD computing such as learning and fluid intelligence (i.e. solving logical problems using induction). Dr. Chris Eliasmith describes all of these in his book “How to build a brain” very well.

HD computing can even prove useful in non-AI domains. Example, I recently participated in a system design exercise with my colleagues where we needed to store 100s of thousands of documents to compare against each student’s submission of an essay to detect plagiarism. The solution involved transforming the documents into HD vectors and using the cosine similarity to detect how close the student’s submission was to each document.

I believe that hyperdimensional computing has a significant part to play in AI’s research and the future of computing.

The possibilities are limitless!

Additional notes

  • Besides cosine similarity for comparing vectors, you can use the hamming distance.
  • Why 10k dimensions? Why not 1k or 1M? 10k dimensions contains billions of nearly orthogonal vectors. This nearly garanties us an orthogonal vector when choosing one at random to all previously chosen ones, therefore we don’t need higher dimensions for this. Based on the use case, lower dimensions can also work.
  • You don’t have to use bipolar (+1, -1) vectors. You can do binary (1, 0) or anything else you want as long as it works for your use case and the operations you use.
  • Why did we use rotate and multiply for binding in example 1, but circular convolution in example 2? I could have demonstrated both examples using any of the operations. The important thing is that the operations are reversible for binding/unbinding to work. For binary vectors, the same thing can be achieved with XOR for example. The decision of which operations, type of vectors (bipolar, binary, etc…) you use depends on the use case, scalability and computing efficiency considerations.

References and resources

--

--

Givi Odikadze
DataSeries

I’m a software engineer addicted to anything tech and science.