Hyperdimensional Computing: Taking AI to the Next Level by Emulating the Brain

Carlos Esteban
LatinXinAI
Published in
18 min readJun 28, 2023

--

Photo by Josh Riemer on Unsplash

Artificial Neural Networks (ANNs) have revolutionized various fields, from computer vision to natural language processing. These networks, inspired by the intricate workings of the human brain, aim to mimic its neural connections for information processing and intelligent decision-making. However, despite their power, ANNs have inherent limitations that prevent them from matching the complexity and adaptability of the human brain.

Personally, I find the human brain to be truly extraordinary, and I’m equally amazed at how AI has advanced by attempting to mimic its capabilities. In fact, the thought-provoking article “A New Approach to Computation Reimagines Artificial Intelligence” has motivated me to write this piece.

This article dives into the fascinating intersection of neuroscience and ANNs, shedding light on the challenges and opportunities that arise when bridging these two domains. Additionally, we explore the concept of hyperdimensional computing, which combines principles from neuroscience and computational modeling. Hyperdimensional computing utilizes high-dimensional vectors to represent and process information, opening up new possibilities for computational paradigms.

From Inputs to Predictions: The Mechanics of an Artificial Neural Network

At a high level, an artificial neural network consists of three main steps: forward propagation, backward propagation and prediction computation.

  1. Forward Propagation: In forward propagation, the inputs are multiplied by weights, and a bias parameter is added. This operation is then passed through an activation function, which is typically a hidden layer or neuron in the network. The activation function introduces non-linearity and helps the network learn complex patterns and relationships in the data. Traditional neural networks primarily rely on traditional vector representation. In this representation, each input, weight, and activation value is typically represented as a numerical value in a vector format.
  2. Backward Propagation: During backward propagation, the network updates its weights by computing the derivative of the loss function with respect to the weights. The goal of this step is to minimize the error or loss and improve the quality of predictions made by the network. By iteratively adjusting the weights based on the gradients, the network gradually learns to make more accurate predictions.
  3. Prediction Computation: Once the network’s weights have been updated through backward propagation, the network is ready to make predictions. New inputs are fed into the network, and the forward propagation process is repeated. The final output of the network represents the prediction or classification of the given input.

However, despite the effectiveness of neural network architecture, there are certain limitations associated with this approach.

Here are some extra resources that go into more detail about how artificial neural networks work:

Limitations of Artificial Neural Networks (ANNs) in Handling Increasing Complexity

In his article, Anil Ananthaswamy highlights the long-standing limitations of Artificial Neural Networks (ANNs). For instance, let’s consider the task of distinguishing between circles and squares using an ANN. One approach is to have two neurons in the output layer, where one neuron signifies a circle and the other denotes a square. However, if we also want the ANN to discern the color of the shape, such as blue or red, we would require four output neurons: one for each combination of color and shape (blue circle, blue square, red circle, and red square). It becomes evident that as the number of features increases, the corresponding number of neurons in the ANN also escalates.

One of the benefits of neural networks is their ability to improve performance by increasing the number of hidden layers. However, it’s important to note that increasing the number of neurons in a neural network also increases complexity and computational requirements. This is particularly true when dealing with a large number of features that need to be discerned. Therefore, having sufficient training data becomes crucial in order to avoid overfitting.

To illustrate this computational requirements issue, let’s consider two examples of artificial neural networks with only 2 hidden layers.

In the first example, the first layer consists of 4 neurons, and the second layer has 2 neurons.

In the second example, the first layer is expanded to 8 neurons, while the second layer remains the same with 2 neurons.

It’s important to note that increasing the number of neurons from 4 to 8 not only enhances network complexity but also leads to a proportional increase in computational resources. This becomes particularly significant when incorporating additional attributes or information into the elements we process (shapes colors etc..). As a result, the computational resources required for these operations will expand correspondingly.

Limitations of ANN using Traditional Vector Representation:

High-dimensional data: ANNs can struggle to effectively handle high-dimensional data. Traditional vector representations require explicit encoding of each feature, which becomes challenging when dealing with a large number of features. This can lead to the curse of dimensionality and increase the complexity of training the network. For example, in image recognition tasks with a large number of pixels as features, ANNs can struggle to capture the relevant patterns and generalize well to new images.

Sparse data: Traditional methods often struggle with sparse data, where most of the feature values are zero. In such cases, ANNs may require additional preprocessing steps, such as feature engineering or dimensionality reduction techniques, to handle the sparsity effectively. For instance, in natural language processing tasks where words are represented as high-dimensional sparse vectors, ANNs may need techniques like word embeddings or tf-idf transformations to effectively capture meaningful relationships between words.

Symbolic information: Traditional methods may struggle to incorporate symbolic information or capture complex relationships between features. ANNs typically rely on numerical values and mathematical operations, which may limit their ability to handle symbolic or structured data. To illustrate this challenge, let’s consider a task involving logical reasoning or symbolic inference. Imagine we want to train an ANN to understand logical rules such as “If it is raining, then take an umbrella.” The symbolic representation of this rule contains explicit symbols (e.g., “raining,” “umbrella”) and their logical relationships (e.g., “if…then”). However, ANNs struggle to directly represent and reason with such symbolic rules.

Overfitting: Overfitting is a common challenge encountered when training artificial neural networks (ANNs) using traditional vector representations. Overfitting occurs when the network becomes too specialized to the training data and fails to generalize well to new, unseen examples. This phenomenon can lead to reduced performance and compromised reliability of the network’s predictions. Overfitting occurs when a complex neural network model memorizes the training data instead of learning general patterns. This leads to poor performance when predicting new, unseen data. To illustrate, imagine a dataset of student ages and math scores. An overfitted model would perfectly fit the training data but struggle to generalize. Techniques like regularization and cross-validation help prevent overfitting.

Exploring the Differences: Human Brain vs. Artificial Neural Networks

While numerous articles draw comparisons between the human brain and artificial neural networks, it is important to acknowledge their fundamental differences.

Neural networks, inspired by the intricacies of the human brain, have become a cornerstone of artificial intelligence systems. However, recent research has shed light on their limitations compared to the human brain. While neural networks excel at tasks like speech recognition and image analysis, a study from MIT warns that caution must be exercised when interpreting these models in the context of neuroscience. The researchers found that neural networks could only reproduce grid-cell-like activity, a key component of the brain’s navigation system, when subjected to specific constraints absent in biological systems. This suggests that neural networks may not generate accurate predictions of brain functionality without pre-defined parameters.

The MIT study (https://news.mit.edu/2022/neural-networks-brain-function-1102) underscores the importance of acknowledging the differences between artificial neural networks and the human brain. Previous studies have suggested that neural networks would naturally exhibit grid-cell-like behavior, but the researchers discovered that this was only possible when certain unrealistic requirements were imposed on the models. These findings highlight the need for a more nuanced approach when using neural networks to understand the brain.

Despite their limitations, neural networks and the human brain share common ground. Both systems rely on connections between processing units to analyze vast amounts of data. Deep learning mimics human learning by gradually improving network performance through exposure to data. However, the abstraction process and the ability of neural networks to handle large datasets set them apart from human brains.

While neural networks have proven invaluable in various applications, understanding their limitations compared to the complexity of the human brain is crucial. The cautionary findings from the MIT study encourage researchers to apply realistic biological constraints when interpreting neural network models in neuroscience. This ongoing exploration will deepen our knowledge of both artificial and biological intelligence, ultimately driving advancements in AI and our understanding of the human brain.

“Bruno Olshausen, a neuroscientist, argues that the notion of our brains having individual neurons for every conceivable combination of features is impractical and implausible. For example, proposing the existence of a neuron solely responsible for recognizing a purple Volkswagen would necessitate an excessively vast number of neurons to cover the infinite array of feature combinations.”

Example:

Let’s consider a scenario with three features: color, shape, and size. The total number of possible combinations can be calculated by multiplying the number of options for each feature. In this case, we have two options for color (red or blue), two options for shape (circle or square), and two options for size (small or large). Therefore, the total number of possible combinations is:

2 (color) * 2 (shape) * 2 (size) = 2³ = 8

As the number of features and combinations increases, the total number of possible combinations grows exponentially. In the provided example with three features, where each feature has two options, we have a total of 8 possible combinations. In a traditional neural network, this would imply dedicating 8 neurons to detect each combination, which is unfeasible in real-life scenarios.

So, when comparing this architecture to our brains, a stark contrast in capability emerges. Our brains possess an extraordinary capacity to process an immense number of features and their combinations. Instead of relying on a separate neuron for each combination, our brains utilize distributed representations and sophisticated neural connections to seamlessly handle the complexity of information. This enables us to discern an incredibly vast range of features and their countless combinations, surpassing the limited capacity of a traditional neural network.

According to Olshausen and other researchers, the brain employs a different mechanism for representing information. They propose that the brain relies on the synchronized activity of multiple neurons, rather than assigning a single neuron to encode the perception of a specific object, such as a purple Volkswagen. In this perspective, the representation of such an object is formed through the coordinated patterns of activation across a vast network of thousands of neurons.

This is where hypervectors can be of great benefit.

Here’s a comparison of how artificial neural networks (ANNs) operate compared to the human brain:

References:

Beyond Orthogonality: Exploring the Potential of Hyperdimensional Vector Representation

Traditional Vector Representation:

Let’s consider the example of traditional vector representation using an image of a vector:

v = [2, -1, 3]

In traditional vector representations, such as representing variables like SHAPE and COLOR, we need separate vectors for each variable and their corresponding values. For instance, if we want to represent shapes like CIRCLE and SQUARE, as well as colors like BLUE and RED, we would require distinct vectors for each.

To ensure distinctness, vectors are quantified by a property known as orthogonality, which implies being at right angles to each other. In three-dimensional space, we can visualize this by having three orthogonal vectors: one in the x-direction, another in the y-direction, and a third in the z-direction.

Example:

Let’s say we have a traditional vector representation for shapes and colors. We can assign the following vectors to represent our variables and values:

SHAPE:

  • CIRCLE: [1, 0, 0]
  • SQUARE: [0, 1, 0]

COLOR:

  • BLUE: [0, 0, 1]
  • RED: [0, 0, -1]

In this example, the vectors for shapes and colors are distinct, and each vector represents a specific aspect. The orthogonality property ensures that the vectors are at right angles to each other, allowing us to differentiate between different shapes and colors based on their vector representations.

Imagine you are working with a 3-dimensional space where you can represent objects based on their shape, color, and size. In this space, you have three distinct vectors: v1, v2, and v3. These vectors represent different aspects of the objects.

Let’s say v1 represents shape, v2 represents color, and v3 represents size. Each vector is at right angles to the other two, ensuring their distinctness. So, for example, v1 might point in the direction of circular shapes, v2 in the direction of blue colors, and v3 in the direction of small sizes.

Now, let’s imagine we expand this concept to a higher-dimensional space, such as a 10,000-dimensional space, to accommodate more features and variables. In this space, we would have 10,000 mutually orthogonal vectors, each representing a different characteristic of the objects. For instance, we might have vectors representing various shapes, colors, sizes, textures, and other attributes.

However, if we allow the vectors to be nearly orthogonal instead of perfectly perpendicular, the number of distinct vectors in the high-dimensional space would explode exponentially. In a 10,000-dimensional space, there would be millions of nearly orthogonal vectors. This means that we would have an enormous number of vectors that are very close to being perpendicular to each other, representing different combinations of features and characteristics.

Hyperdimensional Vector Representation:

Let’s consider the example of hyperdimensional vector representation using an image of a hyperdimensional vector:

h = [1, 0, -2, 3, 0, 5, -1, 2, 4, -3, …]

In hyperdimensional vector representations, we have a unique way of representing things, such as shapes and colors, using a set of numbers. Instead of using just one number, we create a special type of list called a vector that contains multiple numbers. These numbers in the vector provide information about different aspects or features of the object we want to represent.

For instance, if we want to represent shapes like circles and squares, and colors like blue and red, we can create vectors for each of these elements. The exciting part is that when we construct these vectors, we can make them diverge in various directions without overlapping significantly.

This concept is similar to having arrows pointing in different directions without intersecting. Each arrow represents one of the things we want to represent, such as a shape or a color. Since these arrows don’t overlap much, they enable us to easily distinguish and differentiate between different elements.

Creating these arrows, or vectors, becomes easier when we have a larger set of numbers to work with. It’s like having numerous diverse directions to choose from. We can assign random numbers to each direction and create a unique vector for each element we want to represent. Remarkably, these vectors will exhibit minimal overlap, similar to the non-overlapping arrows mentioned earlier.

This way of representing objects in multiple dimensions is known as hyperdimensional representation. It allows us to keep elements separate and facilitates various calculations and operations using these vectors to work with the represented objects.

Example:

Let’s consider an example of hyperdimensional vector representation for shapes and colors:

SHAPES:

  • CIRCLE: [1, 0, -1, 3, 0, 2, -2, 4, 1, -3, …]
  • SQUARE: [0, 1, 2, -1, 0, 5, -3, 2, 4, -2, …]

COLORS:

  • BLUE: [2, -1, 0, 4, -2, 1, -1, 3, 0, -3, …]
  • RED: [-1, 0, 3, -2, 1, 4, -2, 0, -3, 2, …]

Manipulating Hyperdimensional Vector

The transparency of hyperdimensional computation lies in its ability to manipulate vectors with ease. By employing various operations, hyperdimensional computation reveals its extraordinary potential, offering a remarkable level of transparency and flexibility.

  1. Multiplication:

The first operation in hyperdimensional computation is multiplication. It involves combining two vectors to bind their ideas or concepts together.

For example, let’s consider the vectors “SHAPE” and “CIRCLE.” Multiplying these two vectors creates a new vector that represents the idea “SHAPE is CIRCLE.”
The resulting bound vector is nearly orthogonal to both “SHAPE” and “CIRCLE,” meaning it is distinct and separate from them.
One important feature of multiplication is that the individual components of the bound vector are recoverable. This means that if you have a bound vector representing a Volkswagen, you can extract information from it, such as the vector for its color, like “PURPLE.”

2. Addition:

The second operation in hyperdimensional computation is addition. It allows the creation of a new vector that represents the superposition of concepts.

Taking two bound vectors, such as “SHAPE is CIRCLE” and “COLOR is RED,” and adding them together creates a vector representing a circular shape that is red in color.
Similar to multiplication, the resulting superposed vector can be decomposed into its constituent vectors, allowing retrieval of the individual concepts.

3. Permutation:

The third operation is permutation, which involves rearranging the elements of a vector.
Permutation allows you to build structure and deal with sequences or events that occur one after another.

For example, if you have a three-dimensional vector with values labeled x, y, and z, permutation can move the value of x to y, y to z, and z to x.
In the context of hyperdimensional computation, permutation is used to preserve the order of events or sequences.
By combining addition with permutation, you can superpose events represented by hypervectors while retaining information about the order. Reversing the operations allows you to retrieve the events in the correct order.

Hyperdimensional Computing’s Transparent Advantage

Hyperdimensional computing offers transparency through the algebraic operations applied to hypervectors, providing a clear understanding of how concepts are combined, manipulated, and reasoned about. This transparency enables insight into the decision-making process of the system, a characteristic that traditional neural networks often lack.

Hyperdimensional vectors bring transparency to computing by representing information as hypervectors and facilitating symbolic reasoning and concept manipulation through algebraic operations. This transparency grants a deeper understanding of the system’s decision-making process and enhances the interpretability and explainability of artificial intelligence.

A practical example:

# ---------------
# Traditional ANN
# ---------------
import numpy as np

# Define input features
input_features = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])

# Define corresponding target labels
target_labels = np.array([0, 1, 1, 0])

# Define the architecture of the ANN
model = Sequential()
model.add(Dense(2, input_dim=2, activation='sigmoid'))
model.add(Dense(1, activation='sigmoid'))

# Compile and train the model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(input_features, target_labels, epochs=1000)

# Predict on new data
new_data = np.array([[0, 1]])
prediction = model.predict(new_data)
print("Traditional ANN Prediction:", prediction)

# ------------------------
# Hyperdimensional Vector
# ------------------------
import random

# Define hyperdimensional vectors
vector1 = [random.uniform(-1, 1) for _ in range(100)]
vector2 = [random.uniform(-1, 1) for _ in range(100)]

# Combine vectors using element-wise multiplication
combined_vector = [v1 * v2 for v1, v2 in zip(vector1, vector2)]

# Perform symbolic reasoning by adding vectors
result_vector = [v1 + v2 for v1, v2 in zip(vector1, vector2)]

# Extract information from the result vector
extracted_vector = [result_vector[0], result_vector[1]] # Extracting specific components

print("Hyperdimensional Vector Result:", result_vector)
print("Extracted Vector Information:", extracted_vector)

In the example above, we can clearly see the differences in transparency between traditional ANNs and hyperdimensional vectors. In a traditional ANN, we have access to certain observable aspects such as the number of hidden layers, number of units, activation functions, input and output dimensions. However, the actual core operations that take place within the neural network, such as weight and bias updates, remain hidden from our direct observation. This lack of transparency makes it challenging to interpret the results, as it requires delving into the internal parameters responsible for these changes.

On the other hand, in the Hyperdimensional vector example, the operations performed are much more straightforward. Addition and multiplication are the key operations, allowing us to easily manipulate and combine vectors. These operations are transparent and intuitive, making it significantly easier to interpret the results. We can directly observe how concepts are bound together through multiplication and how superposition of concepts is achieved through addition. This simplicity and interpretability of operations in hyperdimensional vectors enhance the transparency of the computations.

Advancing Modern Architectures with Hyperdimensional Computing

In general I believe that hyperdimensional computing can be applied to various areas such as Computer Vision, Speech and Audio Processing, Robotics and Control Systems, Recommendation Systems, Cognitive Modeling, Knowledge Representation and Reasoning, Anomaly Detection, and Natural Language Processing.

In the paper titled “HyperEmbed: Tradeoffs Between Resources and Performance in NLP Tasks with Hyperdimensional Computing Enabled Embedding of n-gram Statistic”, published in 2021, the authors explore the use of hyperdimensional computing for distributed representations of n-gram statistics in intent classification tasks. The paper compares various methods, including conventional n-gram statistics, embeddings with subword information, and high-dimensional vector embeddings. Modern techniques commonly used in architectures, such as word embeddings like Word2Vec and GloVe, are not included in the baselines due to their computational resource requirements.

The study evaluates different datasets, analyzes the impact of dimensionality on performance and resource utilization, and compares the results with baseline approaches.

The experiments in the study demonstrated that higher-dimensional embeddings can enhance the quality of classification up to a certain threshold, resulting in noticeable improvements even on small datasets. However, increasing the dimensionality beyond this point becomes impractical due to limitations in computational resources. The empirical results indicated that higher-dimensional HD vectors improved the quality of embedding, as measured by the achievable F1 score, without significantly affecting or deteriorating the classification performance. Furthermore, the experiments revealed that even with small datasets, considerable gains were observed, highlighting the potential of HD vectors for text classification. It was found that in certain cases, classifiers achieved good F1 scores while also offering significant speedup and memory reduction. However, the performance varied depending on the specific dataset and classifier utilized.

The study suggests potential applications of hyperdimensional computing in ML libraries and recommends exploring binarized classifiers and random projection techniques for achieving performance/resource trade-offs in NLP scenarios.

In general, this approach could open doors to improving modern architectures, such as transformers, by using Hyperdimensional vectors (HD vectors) instead of traditional word embeddings. However, it is difficult to determine whether it would universally improve performance without extensive experimentation and comparison. HD vectors could be used to represent tokens or words in transformers, encoding various attributes and features of the tokens, similar to how word embeddings capture semantic information. It would be interesting to run a similar experiment to the one demonstrated in the paper “HyperEmbed: Tradeoffs Between Resources and Performance in NLP Tasks with Hyperdimensional Computing Enabled Embedding of n-gram Statistic”, but using Word2Vec and Glove. This would let us compare hyperdimensional computing with these techniques to get a clearer picture of their strengths and weaknesses.

Potential benefits and challenges of HD vectors in transformers

The potential benefits of using HD vectors in transformers could include:

  • Improved semantic similarity: HD vectors have been shown to be effective in capturing semantic relationships and similarities. By utilizing HD vectors, transformers may benefit from enhanced semantic representations.
  • Efficient similarity calculations: HD vectors enable fast similarity computations using operations such as Hamming distance or inner product. This could potentially speed up certain computations within transformers.

However, there are also challenges and considerations:

  • Vector size and dimensionality: HD vectors are typically high-dimensional, which may increase the computational and memory requirements of transformers. Handling and manipulating large HD vectors could pose practical challenges.
  • Training and adaptation: Training transformers with HD vectors would require appropriate modifications to the training process, loss functions, and optimization algorithms. Adapting transformers to work with HD vectors may not be straightforward.
  • Comparative performance: It’s crucial to conduct thorough experiments and comparisons to evaluate whether HD vectors outperform or provide any advantages over traditional word embeddings in transformers across various natural language processing tasks.

Conclusion

In conclusion, Artificial Neural Networks (ANNs) have greatly impacted diverse fields by imitating human brain connections. However, ANNs have limitations compared to the complex and adaptable human brain. While excelling in computer vision and natural language processing, ANNs face challenges with complexity, high-dimensional data, sparse data, and symbolic information.

The comparison between ANNs and the human brain reveals fundamental differences. While ANNs rely on individual neurons for each combination of features, the human brain utilizes distributed representations and synchronized activity across vast networks of neurons to process complex information. Recognizing these disparities is crucial to understanding the limitations of ANNs and further advancing our knowledge of both artificial and biological intelligence.

To overcome some of these limitations, the concept of hyperdimensional computing and vector representation offers new possibilities. Hyperdimensional vectors allow for the representation of multiple features in a single vector, enabling differentiation and computation without significant overlap. Hyperdimensional computing combines principles from neuroscience and computational modeling, opening up avenues for more efficient and powerful computational paradigms.

As we continue to explore the intersection of neuroscience and ANNs, it is essential to consider realistic biological constraints when interpreting neural network models in neuroscience research. Studies like the one conducted at MIT serve as a reminder to be cautious when drawing conclusions about how ANNs reflect the intricacies of the human brain. By recognising the disparities and leveraging the respective strengths of ANNs and the human brain, we can propel advancements in AI and develop a more profound comprehension of our own cognitive abilities.

Collected Material on the Topic

Thank you for reading

  • Let’s keep in touch if you have any questions or suggestions on LinkedIn
LatinX in AI (LXAI) logo

Do you identify as Latinx and are working in artificial intelligence or know someone who is Latinx and is working in artificial intelligence?

Don’t forget to hit the 👏 below to help support our community — it means a lot!

--

--

Carlos Esteban
LatinXinAI

Data Scientist/AI developer | Passionate about NLP and Reinforcement Learning