iEEG — How to Detect Eye Motion Using Azure ML and EEG Data

Aaryan Harshith
Analytics Vidhya
Published in
6 min readNov 9, 2019

Imagine your friend is standing across the block. You see him mouthing words, and you’re trying to make sense of them.

Would it be easier for you to hear them in the city centre, where cars are whizzing past, and hundreds of people are bustling off to work, or would it be more natural to make out his speech in an isolated alley?

Currently, this comparison describes the exact challenge scientists face when trying to process our brain signals from electroencephalogram (EEG) data.

For devices as sensitive as EEGs, the slightest twitch can have the same effect as a nuclear detonation stopping you from processing valuable information. These signal ‘detonations’ are known as artifacts, and they can come from the most natural bodily reactions, such as the beat of your heart, sweat, and even blinking. For context, a mere blink can cause a spike that is 10x more pronounced than its surrounding area.

In the field of Brain-Computer Interfaces (BCIs), detecting and removing these artifacts may be the final barrier from extracting the value out of every signal produced by your brain. In this article, I’m going to be covering the development of a machine-learning-based blinking artifact detection system — all built without a single line of code.

Artifacts, Artifacts, Everywhere

Contrary to popular belief, and some high school teachers, artifacts are not always apparent as spikes in data. By definition:

Artifacts are unnecessary signal deviations that hinder data from being processed to its full capabilities.

In an EEG reading, actions such as clenching our teeth, in the example below, will be much more apparent than blinking.

EEG signal charts showing the effects of various artifacts on the readings of electrodes

After observing the picture above, you may see that an artifact isn’t everything it’s made out to be. Rather than being defined by size, a signal is considered an artifact by its obstruction of more valuable data.

Extracting someone’s thoughts and intentions using an EEG reading becomes almost impossible when noise from artifacts corrupts your readings. Imagine scientists having to sift through thousands of data points to manually remove artifacts — this is the main barrier that researches face today to creating truly intelligent BCIs. As of now, data analysts use complex, hard-coded transformation formulas to normalize EEG data.

But what if we could find a simple, robust, and more accurate way of detecting artifacts — without removing valuable data?

Detecting Artifacts

In a world where automation is king, it’s almost as if artifact removal was a problem meant to solved with machine learning!

Computers can learn information and recognize patterns in data faster than any other being on Earth, making them the ideal solution for this sort of dilemma.

Before we start the process of building a model, we need data, and we need lots of it. While EEG data is widely available in almost any large-scale university database, this project utilized data from the University of California Irvine’s (UCI) Machine Learning Repository. To learn how to build this model, you can either continue reading, or watch my in-depth video explanation below 👇

An in-depth video tutorial and explanation of the artifact removal project in Azure

The repository contains thousands of high-quality data sets to use in machine-learning experiments, and the data set we need is known as the EEG Eye State Data Set. The data set contains 117 seconds of EEG measurements from the Emotiv 14-channel Neuroheadset and has over 14,000 columns of data to analyze — where ‘one’ represents the eyes are closed, and ‘zero’ shows that the eyes are open.

After downloading the data set as a .CSV file, assign titles to its columns — although it isn’t necessary, it is a best practice to name your variables.

An example of creating column names to titles your variables — as shown in the green box, I titled the patient number as ID, and left the electrode readings as letters of the alphabet

Now that our variables are titled, we can start building a machine learning model. For this project, I used Microsoft’s Azure Machine Learning Studio, since the pipeline is drag-and-drop, and requires no code.

After creating a blank experiment and uploading your edited CSV file as a data set, you can now utilize our data as a block. Throughout the development of this model, we will be adding and connecting a series of blocks in the workspace.

Now that you’ve uploaded your data set to Azure, the hardest part of the process is complete. The next step is to exclude the ID column of our data set and replace values of zero with NaN — again, although the data set doesn’t contain any zero values, this is another best practice to employ.

In the column selection settings, select all columns, and then exclude the ID column. Attach the ‘Clean Missing Values’ block to it, and leave it at the default settings.

After the removal of the ID column, and the cleaning of zero values, your workspace should look like this.

Our next line of action is to split our data into training and testing sets, as well as to select the appropriate machine learning model. Drag a ‘Split Data’ block into your workspace, and set the ‘Fraction of Rows Split’ to 0.75.

Drag a ‘Train Model’ block into the environment and connect node one of the ‘Split Data’ block to node two of the ‘Train Model’ block. Change the ‘Label Columns’ setting of this block to the variable name that holds the actual values of the data set (zero and one states).

After dozens of test attempts to find the model that would provide the best possible results for this data set, the Two-Class Boosted Decision Tree model performed at the highest standards, so that’s the model we will be using. In the ‘Two-Class Boosted Tree’ Model, set the maximum number of leaves on a tree to 100, and connect this block to the ‘Train Model’ block.

A schematic of how your environment should look like after choosing a model. The maximum number of leaves are set to 100, and the rest of the values are left at default.

To see how amazing the model performs, we need to find a way to score and evaluate it. To do this, simply use the ‘Score Model’ and ‘Evaluate Model’ block. Connect the ‘Train Model’ block to node one of the ‘Score Model’ block, and finally, connect the ‘Score Model’ block to the ‘Evaluate Model’ block.

That’s it! You’ve created your very own machine learning model to normalize EEG data — without a single line of code. To see how our model performs, run your experiment and click ‘Visualize’ on the ‘Evaluate’ block to see your results.

The model performed spectacularly, with an accuracy of 94% — this accuracy is unheard of for artifact removal

The decision tree model achieved an amazing accuracy of 94.4%, meaning that it has outperformed formulas that have been researched for decades. You created this model, all on your own — for free, and with much higher accuracy!

Now that artifacts can be removed in EEG data, we’re one step closer to reaching the full potential of BCIs — a future where everyone can change the world, just with their thoughts.

Thank you for reading!

--

--