The brainā€™s connection to the hand, rendered by Imagine AI

Predicting Hand Movements from EEG Data with an FNN

Building the ā€œbrainsā€ behind noninvasive neuroprosthesis!

Tanvi Reddy

--

Iā€™ve been pondering something recently: whatā€™s the aim of cutting-edge technology beyond positively impacting and enabling the organism that uses it? Use case after use case continues to be born within every field of development ā€” and at the core of it all is a profound fulfillment of need.
In my mind, the same is embodied by neurotechnology. Itā€™s such an embryonic yet diverse field, with abilities spanning anywhere from hands-free art all the way to restoring autonomy and independence to those who have lost critical bodily functions. And, with every new development comes a focus on accessibility and allowing use of the technology by everyone in need of it. Neurotechnology isnā€™t just a connection between the brain and outside world; it has the critical potential to restore, augment, and heal the very core of what it means to be human: the way we interact with the world.

The inextricability of the brain and motor function to each other is something extraordinary. Moreover, its biology was something irreplaceable for those who had lost access to itā€¦until the creation of the neuroprosthetic. With that development, brain-computer interfaces have reached the ability to allow individuals with paralyzed or amputated limbs to control their prosthetics simply by thinking about it. By tapping into the electrical signals of the brain with data-collecting electrodes, neuroprosthetics like robotic arms and legs offer new hope to those with limb disabilities, opening the door to regaining independence and ease of living. Compared to, say, a traditional prosthetic hand, the idea of neuroprosthesis enables users to control multi-fingered prosthetic hands in the most intimate way possible ā€” by thinking about it ā€” and thereby provide a more intuitive, dexterous, and natural grasping experience that gets exponentially closer to the act of operating a genuine limb.

And get this: the world of neuroprosthesis has figured out how to do this all noninvasively and directly from the brain, by using EEG data collected by electrodes on the scalp. By entirely avoiding the realm of surgical implantation, this could be the best way available right now to create ease of movement and prosthetic control without compromising the safety of the user. Moreover, using scalp EEG as opposed to myoelectric control (which relies on signals collected from the muscles of oneā€™s disabled limb) truly widens the scope of this technology to not just the paralyzed, but also to amputees whose neural activity in those areas isnā€™t intact and therefore require genuine mind control.

Check out one of the amazing studies that accomplished just this:

UofH study: amputee grasps water bottle with ā€œbionicā€ hand

Researchers at the University of Houston achieved the control of a multi-fingered neuroprosthetic hand using a scalp EEG cap, allowing an amputee to grasp objects simply by understanding and decoding his intended motions. This approach completely avoided surgically implanted electrodes or reliance on myoelectric control, which was inaccessible to the patient as he had lost his limb altogether.

Until this kind of development, building fully functional mind-controlled prostheses was thought to be possible only with invasive or semi-invasive BCI. But now, with the use of noninvasively-collected EEG data like motor imagery, P300, or SSVEP paradigms as well as machine learning algorithms to interpret them, the potential for significant progress in the development of assistive, adaptive, and rehabilitative BCIs and neuroprostheses is clear.

As for my project, the study that the dataset I used originated from involved an EMOTIV 14-Channel EEG headset aiming to record brainwave activity while the participants attempted to control virtual objects with their hands. The goal was to train the machine-learning component of the overall BCI setup to capture, interpret, and recognize the participantsā€™ brainwave patterns associated with different motion control commands.

Essentially, what I had on hand (get it?) to work with was the EEG data of 4 different participants as well as the hand movements that corresponded to those brainwaves. The ML model I built aimed to accurately predict (without ā€œlookingā€ at the given answer) which brainwaves corresponded to which movement.

By interpreting EEG data to learn and model the relationship between brainwaves and motor movements, a fully developed AI model of this structure would be essential to both understanding the connection of the brain to motor capability, and building the software component of a BCI that gives functionality and life to a real neuroprosthetic hand.

So, without further ado, letā€™s jump in!

Making the Model

A huge thank you to Gabriel Atkin for his awesome tutorial and explanations, and to Fabricio Torquato for his dataset!

Watch a video explanation of my project here šŸ‘‡

First, a little background info on the model!

The model I replicated was a feedforward neural network (FNN). It has one input layer, two hidden layers (both with 128 neurons and ReLU activation), and one output layer with a softmax activation function for multi-class classification.
In an FNN, information flows in one direction ā€” from the input layer, through the hidden layers, and finally to the output layer. This type of neural network architecture is great for classification tasks where the goal is to map input features to output classes, which is exactly what I aimed to do when mapping certain EEG data to certain corresponding hand movements, and vice versa.

Okay, now letā€™s get into the process!

1. Importing the Necessary Libraries & Functions.

import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

import tensorflow as tf
  1. NumPy and Pandas are two open-source libraries for Python that I needed in order to work with the data. NumPy is essential to handling and manipulating arrays, while Pandas does wonders with data manipulation and preprocessing before implementing a neural network.
  2. To use during preprocessing, I imported two functions, train_test_split and StandardScaler, from scikit-learn (sklearn) which is a powerful library for machine learning in Python. It can be used to implement, train, and evaluate various ML models (in this case, a hand movement prediction model!)
  3. And finally, I imported the old but gold tensorflow framework developed by Google, which I needed in order to build the actual model.

2. Reshaping the Data into the Ideal Format

Within the Kaggle dataset I used were 4 different CSV files, each representing the data collected from one user:

Using pd.read_csv() I converted each of the 4 files into a usable data frame, then stored them all in a list of data frames called dfs.

dfs = [pd.read_csv('/kaggle/input/eeg-data-from-hands-movement/Dataset/user_' + user + '.csv') for user in ['a', 'b', 'c', 'd']]

From there, to make it a lot easier to work with the data, I needed to pool these 4 data frames into a single one called data and stack them vertically for easy viewing. But, before doing that, a few modifications were needed:

for i in range(len(dfs)):
dfs[i]['User'] = pd.Series(i, index=dfs[i].index)

data = pd.concat(dfs, axis=0).sample(frac=1.0, random_state=123).reset_index(drop=True)
  1. Adding a new User column at the very end including a row for each ā€œpiece of dataā€, with each taking a value from 0ā€“3 depending on which CSV file (i.e. which user) the data originated from.
  2. Shuffling the rows of data to randomly sample them (with sample(frac=1.0, random_state=123)), rather than having them in completely sorted, ascending order.
  3. Resetting the indices on the left (with reset_index()) to reflect a complete set of 11520 rows, rather than repeatedly cycling through 2879 rows.

Once finally ready to concatenate the data into, well, data, the resulting visualization looked like this!

If youā€™re wondering what all the jargon on the very top row means:

  • Each value, like AF4 or AF3, corresponds to which electrode the EEG data reading came from.
  • std and m respectively indicate the standard deviation and mean of the data readings.
  • The words delta, theta, alpha, and beta form a range of frequencies that human brainwaves take on. So, each box on the top row describes the source and frequency of its corresponding EEG data reading!

3. Creating Helper Functions

Here, I needed to one-hot encode whichever data was not being predicted; which, in this case, was the User data (since the data being used/predicted was the hand movements, which were the Class data in this scenario).

First, letā€™s define one-hot encoding: itā€™s a technique used to represent categorical data as binary vectors, where each category is represented by a binary variable, and only one variable is ā€œhotā€ (1) at a time, while the others are ā€œcoldā€ (0). In this project, I used it to convert categorical information (like Class or User) into a binary format that could be fed into a machine-learning algorithm with improved prediction accuracy.

So check this out: I could one-hot encode by passing the User column of the data through a function called pd.getdummies().

pd.get_dummies(data['User'], prefix='User')

And the result is shown here!

Check out how it corresponds: in the first three rows of the consolidated data frame, thereā€™s an 0 0 2, which maps to the boxes marked True on the right!

I then created a one-hot encoding function of my own called onehot_encode(), which takes a data column, creates and stores its data in dummies, concatenates the original data frame and the new dummies side by side (on axis 1), and drops the original column from which the dummies were created.

def onehot_encode(df, column):
df = df.copy()
dummies = pd.get_dummies(df[column], prefix=column)
df = pd.concat([df, dummies], axis=1)
df = df.drop(column, axis=1)
return df

The one-hot encoding function was now complete and ready to be called upon while preprocessing the data!

4. Data Preprocessing

Now onto creating a bigger helper function: the preprocessing function! The purpose of preprocessing was essentially to clean up, scale, and consolidate the raw data into a clean dataset before feeding it into the model. My function took in the EEG data frame df as well as the data I wanted to predict, which was the Class.

def preprocess_inputs(df, target='Class'):
df = df.copy()

# 1. One-hot encode whichever target column is not being used
targets = ['Class', 'User']
targets.remove(target)
df = onehot_encode(df, column=targets[0])

# 2. Split df into x and y
y = df[target].copy()
X = df.drop(target, axis=1)

# 3. Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.7, random_state=123)

# 4. Scale X with a standard scaler
scaler = StandardScaler()
scaler.fit(X_train)

# 5. Transform X_train and X_test
X_train = pd.DataFrame(scaler.transform(X_train), columns=X.columns)
X_test = pd.DataFrame(scaler.transform(X_test), columns=X.columns)

return X_train, X_test, y_train, y_test
  1. I first made a copy of the data frame, then one-hot encoded the data that was not the target (in this case, the User data since the target was the Class).
  2. Then, I split df (the consolidated data frame) into X and y, where y was defined as the target column (what I was trying to predict) and X was defined as everything except the target column.
  3. Before I went into scaling, I performed a train-test split using the train_test_split() function from sklearn, the library I imported earlier. This function took a percentage of the EEG data and stored it in a training set, while the rest went into a testing set which would both be used in the development of the neural network.
    The train_size was 70%, and the inclusion of random_state ensured that the shuffle and split of the data would happen the same way each time the notebook ran.
  4. Now to scale X, with the StandardScaler() function from sklearn! Here, I fit the scaler to the training dataset with scaler.fit.
    If youā€™re wondering why I scaled X and not y, itā€™s because the data in X was going to be used as inputs to the model, while y needed to remain as the Class labels and therefore couldnā€™t have its values modified in any way.
  5. And lastly, I transformed X_train and X_test using the function scaler.transform(). The purpose of transforming the data after scaling was to ensure that the same scaling was applied to both the training and testing datasets, preventing any inconsistency in size that could impact my modelā€™s performance.
    The function scaler.transform() outputs data as a NumPy array, while I instead wanted to keep the data in the form of a data frame. So, I made a minor modification by adding pd.DataFrame over the transformation to turn the data array back into a data frame after it had been transformed.

Finally, I did a little test to check that my preprocessing function worked as intended. And it did! Once I called the function and displayed the final X_train, I was shown a data frame with its format intact but with two new modifications:

Click to enlarge!
  1. There were now only 8063 rows, which was 70% of the original data! This was just as I intended from my train-test split, in which I defined the size of the training dataset X_train to be 0.7 by using train_size=0.7.
  2. Also, all the data was scaled so that the means of every column were all very close to 0, while the variances of every column were all very close to 1. This indicated that all the data were successfully scaled to take on a common range of values.

With the data coming out as intended, the preprocessing function was now complete and ready to be put to use in the next steps.

5. Building the Model

The last and most important helper function served to build my standard Keras neural network!

# This function is going to build, compile, and return the model

def build_model(num_classes=3):

# Input Layer
inputs = tf.keras.Input(shape=(X_train.shape[1],))

# Two Hidden Layers
x = tf.keras.layers.Dense(128, activation='relu')(inputs)
x = tf.keras.layers.Dense(128, activation='relu')(x)

# Output Layer
outputs = tf.keras.layers.Dense(num_classes, activation='softmax')(x) #softmax since both tasks we're doing are multi-class problems

model = tf.keras.Model(inputs=inputs, outputs=outputs)

model.compile(
optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)

return model
  1. Input Layer: I defined the input as tf.keras.Input(), its shape being a tuple vector of length X_train.shape[1]. Letā€™s break that down.

    Explaining Keras
    Keras is an open-source library that provides tools to train and implement artificial neural networks. Itā€™s a deep-learning API tightly integrated with TensorFlow, which made it essential to the creation and development of my FNN.
    Explaining the shape (X_train.shape[1])
    If I displayed X_train.shape, I would get the vector (8063, 116), meaning the training dataset holds 8063 rows by 116 features. Moreover, if I called just X_train.shape[1], I would get 116, which is just the number of features. So, using X_train.shape[1] in creating the inputs meant that on any given example, Iā€™d have 116 features being inputted.
  2. Two Hidden Layers: both of the two hidden layers I created were dense layers with 128 nodes and ReLU activation. Inputs were passed through the first layer, while an x passed through the second.

    Explaining ā€œdense layersā€(tf.keras.layers.Dense())
    A dense layer is a type of layer in which all of its nodes are connected to all the nodes in the previous layer. Dense layers are the most common type of layer in neural networks and are good for things like feature learning and classification.
    Explaining ReLU Activation (activation='relu')
    The ReLU (Rectified Linear Unit) activation function introduces non-linearity, meaning it enables a neural network to learn and model complex patterns in the input data that arenā€™t linearly separable. It essentially transforms input data by returning 0 for negative inputs and the input value for positive inputs, allowing a network to learn and make predictions more effectively than other activations like sigmoid.
  3. Output Layer: The output layer is also dense, with the number of activations being num_classes (the number of output classes, which was specified earlier to be 3) and the activation function being softmax, since the tasks of both predicting hand movements and vice versa are multi-class classification problems.

    Explaining softmax (activation='softmax')
    The softmax activation function converts the arbitrary raw scores that the model outputs into probabilities of values between 0 and 1 that sum to 1, allowing the model to make decisions based on the class with the highest probability. Softmax is usually the best choice for multi-class problems like this one, since it provides a standardized probability distribution (0ā€“1) across all the classes.
  4. Creating the Model using model = tf.keras.Model(inputs=inputs, outputs=outputs)
  5. Compiling the Model: The last step in creating the model was to compile it with an Adam optimizer, a sparse categorical cross-entropy loss function for multi-class loss, and accuracy as the metric.

    Explaining ā€œcompiling the modelā€ (model.compile())
    In essence, compiling a model configures it with information on how to measure its performance and make necessary adjustments to its weights while training, allowing the model to learn from itself. This key information is nothing more than a specification of the loss function, optimizer, and metrics to be used in training and evaluating the model once the code runs.
    Explaining the Adam optimizer (optimizer='adam')
    The optimization step during compiling serves to efficiently adjust (aka optimize) the modelā€™s weights while it trains, facilitating self-learning and improvement. The Adam optimizerā€™s adaptive nature makes it super flexible and ideal for training a neural network like an FNN.
    Explaining the sparse categorical cross-entropy loss function
    During training, a loss function essentially measures the modelā€™s performance and optimizes its weights by comparing its predicted outputs to the actual targets. The goal is to minimize the difference between the two, thereby improving the modelā€™s accuracy. The sparse categorical cross-entropy loss function in particular is great for multi-class classification problems like this one, where the target values are provided as integers and arenā€™t one-hot encoded.
    Explaining the accuracy metric
    The accuracy metric is super helpful in evaluating the overall performance of the model during training and testing, by measuring the percentage of correctly classified samples. For example, an accuracy of .75 would mean my model correctly predicted the class of 75% of the data samples.

Can you guess what came next?

return model

Thatā€™s right, simply a return of the compiled model. At this point, the helper functions were all ready and the model had been made. It was now time to put this code to the test!

Using the Model to Predict Hand Movements

I now created a model that predicted Class by mapping and learning the relationships between certain EEG data readings to certain hand movements. The following steps donā€™t require too much explanation!

class_model = build_model(num_classes=3)

class_history = class_model.fit(
X_train,
y_train,
validation_split=0.2,
batch_size=32,
epochs=50,
callbacks=[
tf.keras.callbacks.EarlyStopping(
monitor='val_loss',
patience=3,
restore_best_weights=True
)
]
)
# Should stop once it notes that the validation loss stopped increasing
  1. Calling the build_model() function with the number of classes set to 3.
  2. Fitting the model on X_train and y_train.
  3. Giving the model a validation split of 20% using validation_split=0.2, meaning 20% of the EEG data is set aside for validation to measure the performance of the model, while the other 80% is used for training it.
  4. Giving the model a batch size of 32 using batch_size=32, meaning 32 data samples will be used to estimate the error gradient before the model updates its weights.
  5. Training the model for 50 epochs using epochs=50, meaning the entire dataset will be passed through the machine learning algorithm 50 times, with 50 opportunities to update internal parameters and self-improve.
  6. The callback function customizes the model to look at the validation loss (a lower validation loss means better performance) and see if the value is improving by getting closer to 0 with every epoch. If the validation loss stops improving for a given number of epochs (which has been set to 3 using patience=3), the model will stop training early (meaning itā€™ll stop before a full 50 epochs) and restore the weights from the best iteration. Essentially, this means the model will stop training early when it notices its improvement stagnating, and then restore its parameters to whatever they were during its peak performance!

Running the Model and Getting its Accuracy

Check out the model running for 20 epochs, and pay attention to how the validation loss gradually got lower and began to stagnate!

And there you have it! After running the prediction model, it was time to check its accuracy:

class_acc = class_model.evaluate(X_test, y_test, verbose=0)[1]
print('Test Accuracy (Class Model): {:.2f}%'.format(class_acc * 100))

In the particular run shown above, I received an average accuracy calculation of 65.55%, which came out in the range of 63ā€“69% each time I did another run. In its peak epoch, the accuracy reached was around 93%.

While these numbers arenā€™t consistently perfect, this can be attributed to the fact that itā€™s generally difficult to get TensorFlow to behave and perform consistently in a deterministic state with complex data like EEG, even if youā€™ve set random seeds. There usually always arises some variation in results even when the model, data, and parameters (like the random seed) are held constant.

Nevertheless, this FNN proved generally effective in mapping EEG data samples to corresponding hand movements. An RNN (recurrent neural network) also could have been used for this particular task and would have performed with more flexibility, but on the other hand would have taken much, much longer to train the model than my FNN did.

Bonus: Predicting the Participant

To switch it up a bit, I predicted the user (i.e. the doer of the hand movements) instead! This meant using the previous Class data as inputs/features and using the User data as the classes to be predicted. I did this in a few simple steps.

user_model = build_model(num_classes=4)

user_history = user_model.fit(
X_train,
y_train,
validation_split=0.2,
batch_size=32,
epochs=50,
callbacks=[
tf.keras.callbacks.EarlyStopping(
monitor='val_loss',
patience=3,
restore_best_weights=True
)
]
)
  1. Grabbing the same code from above (the creation of the Class prediction model) and changing num_classes to 4 instead of 3, as there were 4 users/participants in the study.
  2. Renaming the variables to replace Class with User in every instance, mainly for user_history and user_model.
  3. Running with the same validation split, batch size, number of epochs, and patience for early stopping.
  4. And finally, getting the modelā€™s accuracy!
Test Accuracy (User Model): 99.94%

Based on that huge increase in accuracy, it was much, much easier for the ML model, given the EEG data, to predict the doer of the hand movements than the hand movements themselves. This was predictable, based on the overall simplicity of this reversed classification task in comparison to the original one of predicting hand movements.

That concludes the creation and implementation of my FNN! An AI model like this, which learns to map brainwaves to motor movements, could help us achieve key functionality when it comes to bringing a neuroprosthetic hand to life. Iā€™m excited to explore the hardware component soonā€¦ā˜ŗ

Thanks for reading! I canā€™t wait to see you back here for more project explanations. Until then:

--

--