Experimenting with Machine Learning for EEG data

A first journey into DIY Brain Computer Interfaces, part 4

Published in

Building a bedroom BCI

6 min readSep 6, 2021

Fig 1. Designed by rawpixel.com / Freepik

Disclaimer: This series of blog posts is written to make myself aware of the details and concepts in the field of BCIs and neuroscience, as I go through my very first own BCI project. Right now, I really have no idea how this project will turn out.
Therefore, note that this series of blog posts should not be used as a definitive guide for your own journey. Rather, I hope you will take inspiration from my first journey, and maybe not make the same mistakes as I might do ;)
Update: In the summer of 2022, I started a new series of blog posts with updated, more advanced information, and way better results than achieved in this series of blog posts. Click here to go to part 1 of that series.

Welcome to this series where I document my process of building my very first BCI bedroom project! We have arrived at part 4, in what is often referred to as the exciting part. That is, experimenting with machine learning models!

In previous parts, I have explained some background in the field of BCIs (part 1), I have explained how I collected my data for my experiment (part 2), and I have documented my pre-processing steps (part 3).

Now, let’s first jump into the steps I have taken to get my data all set and ready to use it as input for the machine learning models which we’ll experiment with!

The code corresponding to the project in this blog post can be found at this GitHub repository. Specifically, in this part we discuss step 4, which can be found in the Python file 4_MLmodels.py and the folder step_4_MLModels.

Preparing data for learning

Getting the data ready for learning consists firstly of transforming our label columns into 1 column, which we use as the y column for our machine learning models.

Next, we want to add all our datasets into 1 big dataframe. As my goal is to have a model which can perform well for unknown users, without additional training, we want to randomly shuffle all our data of all users, and use that as input for our model.

With this, we have also decided that our problem will be non-temporal. In other words, we assume that brain activity for motor imagery is not influenced by prior datapoints. In reality, prior datapoints might influence the brain activity, but we neglect this for this project to make this step a bit easier.

The next step is creating training, validation and test sets. We will randomly divide our data into 60% training data, 20% validation data and 20% test data.

Lastly, we make different sets of features, in order to make subsets of our data as input for the machine learning models, for experimental reasons. Sometimes, a small set of features is actually beneficial for validation performance, as a model with a large set of features could be too complex and thus overfit the training data, resulting in worse performance for the validation or test data.

These subsets are:

Basic features: our 20 features of brain wave data which we started with
Basic with PCA: Adding the PCA features to the basic features.
Basic with ICA: Adding the ICA features to the basic features to see if the way I implemented ICA was of any help.
All features: all our 584 features.
Selected features: By doing an initial experiment with a decision tree, we collect the top 20 most influential features used for this decision tree, as these might be sufficient for good performance for other models as well.

Experimenting with learning

Now, our data is ready for the machine learning models! We will implement 8 different machine learning models using scikit-learn. These models can be categorized as deterministic and non deterministic models, where non deterministic models have a factor of randomness and thus different performance each run, while deterministic models will always give the same results.

Our deterministic models:

K-Nearest Neighbors (kNN)
Descision Tree (DT)
Naive Bayes (NB)
Linear Discrimant Analysis (LDA), which I added later when I read this cool post with advice for bedroom BCI projects :)

Our non deterministic models:

Neural network (NN)
Random Forest (RF)
Support Vector Machine with rbf-kernel (SVM)

For the non deterministic models, we repeat the training process 10 times and then take the average performance as metric.

Speaking of the metric, we will evaluate our models based on the F1 score. We chose for the F1 score as metric because we want to have a good precision, but also a good recall, as when controlling movement with a BCI system, doing the wrong movement can potentially be harmful for our subject. This is not the case for our particular project, but we use the F1 score anyway for a good balance.

To take it further, if doing the wrong movement would give a high chance of a harmful situation, we might focus a lot on getting a high precision only: missing one command of movement (lower recall) is less harmful than deciding on doing the wrong movement (lower precision).

For all our models in this evaluation phase, we do not apply any form of parameter optimization yet, as running grid or randomized search for the non deterministic models 10 times will take too long. Rather, we will evaluate the performance now by evaluation using the validation data. Then, we chose the best performing model, and train again, now using grid search. Lastly, we evaluate the final performance of this model on the test set.

Evaluation

The results of the evaluation are presented in figure 2 below. First thing to note is that the RF model with all features has the best performance for F1 score on the validation data. Thus, this model is what we will run with grid search later! But first, let’s have a closer look into figure 2, in which other things of interest are:

Adding PCA or ICA features to the basic features does not change performance much.
SVM performs very badly, while I have read often that SVM is used in BCI research. This might bu due to some parameter differences.
Using the top 20 selected features is not enough for good performance; for most models, using all features gives best performance.

Figure 2: The F1 score of all models with different subsets of features. For the non deterministic models, the presented performance is an average over 10 runs.

Let’s now further investigate the best performing model till now, which is the random forest model. We again train this model, but now with grid search for parameter optimization. Evaluating performance on the test set gives a very good and consistent F1 score of around 0.85!

Update: make a mistake..

After I was celebrating the good performance of my model, I realized I dropped all data in which I asked the subject not to think left or right.. in other words, the relaxing state. Thus, I altered my code now a bit in order to have a 3rd label in my data: undefined.

I ran my code again, and the results are below in figure 3:

Figure 3: Evaluation of a 3-label classification, where I added ‘undefined’ as class.

The results are worse, unfortunately. But I still think the random forest model with all features is not a bad option, so I trained this model again with grid search, and got a F1 score of 0.63. I think that’s reasonable.

Conclusion

After this evaluation of the machine learning models, I am hopeful that the project might have some kind of success :). We have found out that the random forest model, with grid search, gives the best performance: a F1 score of around 0.85. Only later I found out I needed to add a 3rd label: undefined, for when the subject is not thinking about any movement. After adding this, the performance dropped, but is still reasonable, with a F1 score of 0.63.

We will save this model using Pickle, as we will use it in part 5: real time predictions!

Click here to go back to part 1, part 2, or part 3. The last part of this series is available here.
Any feedback for this project and blog post series is welcome!