Using Resting State EEG Data of Patients with Epilepsy For Functional Connectivity

Raina Bornstein
Geek Culture
Published in
14 min readJun 29, 2021

A programming based approach to cleaning and analyzing variations of brain activity for epileptic patients through the scope of functional connectivity.

Table of Contents

  1. Introduction
  2. Basics of Epilepsy and EEG
  3. MatLab and Software Extensions
  4. The Importance of Cleaning Data and How I Cleaned Mine
  5. Analysis…Or Not?
  6. How to Make Use of This
  7. Next Steps
  8. Key Takeaways

Introduction

Imagine this: you’re doing something routine that you’ve done a thousand times before. You could be anywhere: at the gym, at school or work, the beach, or literally any other place you’ve ever been. One second things are completely normal, but then out of nowhere a surge of uncontrolled electrical activity goes through your brain. When you open your eyes again you’re on the floor, and have no idea what just happened. For a little while, you don’t even know where you are, you feel weak and exhausted, your head hurts, you might throw up, and there’s a terrible anxious feeling you can’t shake.

This is what it’s like to have a seizure. Seizures are terrifying, and they can happen to anyone at any time. But over 65 million people worldwide are faced with a condition that makes them constantly prone to these horrible seizures, along with a variety of other disruptive impairments based in the brain. A condition called epilepsy.

Basics of Epilepsy and EEG

There are four main types of epilepsy, which are classified based on the types of seizures a patient experiences and where they’re based in the brain. The first type is generalized epilepsy, and the patients with this type of epilepsy experience seizures that start on both sides of the brain or can very quickly have an impact on both sides. The next type is focal epilepsy, where patients repeatedly experience seizures based in the same brain region on one side of the brain. The third type is generalized and focal epilepsy, where a patient experiences both types, and finally unclassified or unknown epilepsy when it’s unclear or unestablished which of these types applies.

Intense uncontrolled electrical activity occurs in the brain, causing a seizure.

Based on the constant challenges patients with epilepsy experience simply trying to live in a way others do effortlessly and subconsciously, it’s evident that they have very different brains from those without their condition. But the ways in which they’re different is a little harder to figure out at first glance. To do this, scientists use special imaging methods.

Since epilepsy is a condition based in electrical activity levels, the logical imaging method used for this condition is EEG. Short for electroencephalography, EEG imaging detects electrical activity in the brain using small metal disks called electrodes that are placed on the exterior of the scalp.

MatLab and Software Extensions

I started out with raw data from this dataset of pediatric patients with epilepsy. In order to perform all of the cleaning, sorting, and processing I needed to do, I used a popular science and engineering based programming platform called MatLab, and its specifically EEG based toolbox extension EEGLAB. This was a very organized and much more convenient way to perform all the tasks I needed (and there were A LOT), it’s a massive time saver even if you know how to program and I highly recommend for any science or engineering based programming project.

The Importance of Cleaning Data and How I Cleaned Mine

Prior to any cleaning, the data I had was completely unfiltered and straight up just the readings of patients when they were wearing the EEG cap. This data wasn’t ready to analyze as is, because it wasn’t clean at all. Cleaning data is incredibly important because the raw readings may contain the data you’re looking for, but they’ll also include white noise, eye blinks, and other things that will dilute the data. In order to get clean and accurate results, you need to make sure the data you’re analyzing is clean, because trash data inputs will only ever result in trash conclusions.

Preprocessing

Prior to any cleaning, the data looked like this. As you can see, eye blinks (circled in red) create massive variation in amplitude and frequency in comparison to the rest of the reading, which will get in the way of properly analyzing the data. Therefore, I needed to run the data through a Finite Impulse Response (FIR) filter which would put limits on the frequency of a wave to eliminate eye blinks and other unwanted outliers. I put the maximum frequency at 30 Hertz (which is fairly standard), along with a standard minimum of .05 Hertz. By doing this, I was able to eliminate the eye blinks since their frequencies were much different from the rest of the data.

The channel data prior to any filtering. Eye blinks (circled in red) create large variations in frequency across channels.

After I put the data through the FIR filter, it was time to re-reference it. An EEG cap looks similar to a net on the patient’s head, but the net is made of electrodes which take readings of brain waves. There is one electrode in any dataset called the common reference, which all other channels can use as a basis for comparison. Scientists have disagreements surrounding this topic, but there is no universal “best” common electrode. However, there will be individual best suiting electrodes of reference for singular datasets. By re-referencing the data, you convert the point of reference for a dataset to its personal average reference. This makes for a much better point of comparison which enables better analysis as well.

EEG caps are nets made of electrodes, which take readings from a subject’s brain.

Rejecting Artifacts/ICA

Now that we’ve preprocessed the data, it’s time to reject artifacts. I use artifacts as an umbrella term, since you can reject entire channels, specific parts of the data, or both. The way I chose to reject artifacts was with a technique called Independent Component Analysis (ICA), because it can remove bad artifacts embedded within data without having to remove an entire channel or section of good data. There are a variety of different ICA algorithms suited to different types of data, but the algorithm I used is called jade.

The main reason I find ICA preferable to rejecting data by hand or using automated rejection is its meticulous consideration for all types of noise or distraction, which enables it to do a better job distinguishing among and ultimately decomposing the data. It can be challenging to distinguish visibly, and automated rejection simply has a lower level of detail and consideration involved, so I used ICA to maximize precision.

Extracting Data Epochs

After running ICA, it was time to extract epochs. The purpose of extracting epochs is to be able to study the event related details of nonstop data. For example, if there are epochs connected to the times of certain stimuli, these should be extracted from the data.

To do this, the mean of each data channel is removed. Extracting epochs in EEGLAB is relatively straightforward, so I did this step fairly easily and without problems or complications. You can opt to only extract epochs for specific channel types, but I did it for all of my channels.

Plotting Data

Once I had extracted the epochs, I had completed the steps to cleaning my data! However, before I could advance to analysis, I wanted to ensure I had done everything correctly and that my data was fully clean. To do this, I created a few different types of plots.

The first type of plot is a spectopo plot. This maps the channel spectra, with each line representing a channel. If there are any lines far away from the rest of the channels and not touching more than a few other channels at any point, those need to be removed as well. This is what my spectopo plot looked like after the cleaning steps described above:

My spectopo plot after cleaning the data.

As you can see, all of the lines are very connected. While there are two lines that start just slightly above the others, they ultimately wind up connecting with the other channels on the plot and we want as much data as possible for prime analysis, so I made the executive decision to keep them.

The second type of plot is an ERP image. In this plot, each horizontal line represents activity from a single trial. What you’re looking to see is if any of the lines look strongly different in the colors they feature or the frequency of those colors occuring. The distribution of color and occurrence throughout horizontal lines of data here is also relatively consistent (it won’t ever be 100 percent the same unless you’re running the same piece of data over and over again), so I had the green light in this sense as well.

My ERP image plot after filtering the data.

The final plot I looked at before proceeding was the channel data, which you saw initially in its raw state. After cleaning, it should have lots of similar looking lines with similarly sized waves, but no waves that are way smaller or larger to the point of being outliers and no channels that are exactly the same. This is what the plot looked like:

My channel data after having filtered the data.

It also checks out, so I had successfully completed the cleaning stage and was ready to move on to analysis!! To see how I did this, all you have to do is keep reading 😊

Analysis…Or Not?

Transforming raw data was the gist of the project as it took lots of struggling, problem solving, reiterating, and perseverance. I may have lined it out to seem easy based on the semi simple explanation, but there was a lot of trial and error involved, especially as someone who had never used the software before. The ability to take trashy data and make it useful is the main skill I was looking to train here, and afterward I was ready to perform analysis as a final touch of sorts.

However, the outcome wasn’t as I expected. I spent hours going through different potential softwares to find research methods, extensions, and protocol for a variety of approaches only to come up empty handed. Some ideas were short lived, while others involved lengthy rabbit holes only to come to the same ultimate failure. However, this series of failures was critical in my learning and in enabling me to learn more about the work I was doing. Therefore, the time I put into it still added value to myself as well as the project as a whole, and acted as a critical part of my experience. I’ll be going through each approach I looked at, why it fell short, and what that showed me. Additionally, the more methods I went through, the easier it became to spot shortcomings early on, and I got closer to success each time.

CONN Toolbox

CONN toolbox is a MatLab based toolbox extension that has been used by many of my colleagues and role models for projects based in functional connectivity, which naturally made it more seem more credible and plausible than other alternatives right off the bat. I was successfully able to get it into MatLab with all of its subfolders and run the extension, but this trial was cut short when I saw that it was designed for fMRI data and was lacking in the EEG tailored features that would enable me to do successful analysis.

PLV

After speaking with a mentor of mine, I discovered that PLV was a strong metric for analyzing functional connectivity and that it could be the key to a successful analysis on my part! This is commonly performed in python, but since I used MatLab as a mechanism for my project instead, I needed to see if it was available in MatLab/eeglab. I found a few links suggesting there were similar functions in MatLab, but upon deeper analysis I discovered that these functions weren’t the same at all.

FCLab

FCLab is a MatLab extension specifically designed for functional connectivity. I read some papers that made it seem like a perfect fit, so I thought it was possible I had found the answer! However, once I read more into the features I discovered that they were catered towards single subject data (which mine isn’t), and additionally that the extension doesn’t seem to exist anymore.

Functional Connectivity Toolbox

After I found that the FCLab toolbox no longer existed, I was curious to see if there were any other functional connectivity toolboxes on MatLab that still existed. I was able to find the Functional Connectivity Toolbox on MatLab, but unfortunately the function was inadequate for what I was looking to achieve. It was not beginner friendly at all and was lacking in comprehensible instruction, so for someone relatively new to this field it was clear this wouldn’t work and therefore I knew I had to move on.

Econnectome

This was the closest I got to success. Econnectome is also an extension in MatLab which is used for functional connectivity imaging, more specifically for EEG, MEG, and ECoG data. I was able to successfully download and install the extension in MatLab, and even began to play around with it! I got to get a feel for the extension and its features, but then I went to plug in my own data. I really thought I had made it, but then I noticed that I could only import TXT or MAT files. All my files were either edf, set, or fdt. I found websites with the capacity to convert my data to these forms, but txt and mat files both have very specific formats, so even conversion apps couldn’t convert my data based on the layout of the data within the file. This was very unfortunate, but I still had the opportunity to play around with the extension which was good learning, and it was the closest to success I got out of any of my attempts.

Although it was disappointing not to find success in any of these analysis methods, each offered new knowledge and value to me which not only helped me grow as I advanced to new method trials, but also in understanding how these extensions work, what to look for, and how to use them.

How to Make Use of This

This freshly clean dataset is useless if you don’t know what it’s helpful for, even post analysis. There are many different ways you can use a clean dataset, but based on the fact that my main goal for analysis and the purpose of my cleaning the data was to utilize it for functional connectivity, that was the standpoint I was looking at with regards to making use of the outcome as well.

Functional connectivity is useful when looking at epilepsy because understanding synaptic connections can be incredibly helpful in identifying causes throughout the process of a seizure. The very beginning of a seizure begins with uncontrolled electrical activity. Functional connectivity can be used to determine where this activity begins, and then how it spreads through connections until it ultimately reaches the level of spread or intensity which causes a seizure. As the seizure occurs, functional connectivity helps establish what’s occurring in specific brain regions, and how the connections between those regions impact the output. Post seizure, functional connectivity will have helped doctors understand not only where the seizure started, but the strong connections to that area which impacted the seizure as well. They can use this information to determine which areas of the brain to monitor in order to identify potential future seizures at their starting points and be proactive in shutting them down.

But where does my new, clean data come into play? Well, we just established that the point is to look for patterns and trends (with the use of functional connectivity) that will help us predict and prevent seizures in a proactive epilepsy treatment approach. But how could we possibly identify patterns without clean data from patients with this same condition?

While epilepsy is a heterogeneous condition (meaning there’s a lot of variety within the condition, and it encompasses many subgroups with more similar variations of the all encompassing epilepsy), this data is still very helpful for two reasons. First off, while it will be the less precise of the two approaches, establishing general trends for the condition as a whole can still be valuable information to have. However, the clean data can also be helpful in determining the biogroups (think back to the plots I showed you before!) of the condition and determining the more targeted specific trends within those groups for more accurate predictive models.

Next Steps

Next steps for this project would be taking the clean data and creating a supervised neural network both for the dataset as a whole, and for specific biogroups determined by clustering analysis (which could also be achieved with the use of an unsupervised neural network) to form predictive models.

This may just seem like a bunch of data cleaning and some functional connectivity analysis, but it has the capacity to impact millions of lives. Patients living with epilepsy have constant struggles in ways the rest of us can’t understand, and by utilizing what I’ve done with this project, we can finally enable them to thrive without the constant painful and exhausting interruptions they’re forced to live with.

Key Takeaways

  1. Epilepsy is a brain based condition caused by constant surges of uncontrolled electrical activity in the brain. It can cause frequent seizures along with other harsh symptoms, and is greatly impairing to the millions who struggle with it.
  2. Electroencephalography (EEG) is a brain imaging method that specifically measures electrical activity in the brain. Since epilepsy is based in electrical activity levels, EEG is a perfect imaging method for analyzing this condition.
  3. To clean my EEG data, I used MatLab and the EEGLAB extension. I accomplished this in four main steps: preprocessing, running Independent Component Analysis (ICA), extracting epochs, and plotting the data in several ways to verify success.
  4. Next came analysis. The type of analysis I was looking to perform was functional connectivity analysis, and I tried five different methods for performing this, but I wasn’t able to go all the way through with any of them. However, they each taught me new things about this analysis as well as the software, and it was a great opportunity for learning and growth.
  5. This newly clean data combined with the use of functional connectivity can be incredibly helpful for epileptic patients throughout all stages of seizures, and to predict and prevent them as well.
  6. Epilepsy is heterogeneous, meaning there’s a lot of variety within the condition and it can be better thought of as an umbrella for smaller subgroups of epilepsy as a whole. However, making predictive models for the condition as a whole and for these subgroups (which can be determined using data plots) can both be helpful.
  7. Next steps for the project would involve putting the data through neural networks to form predictive models for the data as a whole as well as its subgroups.

Hi, thank you so much for reading my article! My name is Raina Bornstein, I’m 15 years old and I have a passion for neuroscience. If you’d be interested in collaborating, taking this project further, or meeting with me, I’d love to set that up! Feel free to connect with me on LinkedIn or reach out to me at rainabornstein@gmail.com . Looking forward to hearing from you!

--

--

Raina Bornstein
Geek Culture

I'm 17 years old, and I have a passion for science. Areas I am particularly interested in include neuroscience, biotech, and entrepreneurship.