Processing EEG data with python

Vishal_Kumar
3 min readJun 7, 2020

--

EEG data is time-variant data and contain a lot of artifacts which if not cleared can lead to a bad datasets and if used in any machine learning or mathematical modelling lead to a very bad design with a high error, here in this post we will discuss some of the methods and tools in python to clean or pre-process the data.

Image credit Raphael Vallat ( Postdoctoral fellow, Walker Lab, U.C Berkeley)

We will use python libraries mne, numpy, scipy and pandas to preprocess and make data usable for further machine learning algorithms and models, the data will be either in raw format or in numpy array ( mostly in .csv file), EEG data will come in .fif, .cnt etc. Let us take a look at what is EEG data:-

EEG data is electrical activity generated by neurons in brain and recorded over time with the help of noninvasive electrodes placed on scalp of human brain, usually electrodes are placed by 10–20 system in most cases, but always check which electrode placement system is used in experiment. Once we get the data in raw format, first of all we load the data using mne library, link for api refrence for mne is provided below.

import mne# set false if you are not using cuda
mne.utils.set_config('MNE_USE_CUDA', 'true')
"""Read raw data and save in raw_data variable if your file format is different from .cnt see api reference to load it will be like mne.io.read_raw_fif etc"""path = "../.../person_1.cnt"
raw_data = mne.io.read_raw_cnt(path, preload=True).load_data()

"""If we know the channels in which we are interested than we can directly pick the channels by passing there names as a list"""
raw_data.pick_channels(['FC6', 'FT8', 'C5', 'CP3', 'P3', 'T7', 'CP5',
'C3', 'CP1', 'C4'])

above mentioned channels are from Broca’s and Wernick’s area of brain which are responsible for speech processing. For sake of preprocessing we do not use the noisiest channel in our data here noisiest means the channel which have too much spikes or no spikes. EEG data has four bands divided according to the frequency range Delta, Alpha, Theta, Beta we can pick a frequency band and can filter raw data according to our requirement.

low_freq, high_freq = 4.0, 35.0 # values in Hz
raw_data = raw_data.filter(low_freq, high_freq, n_jobs=4)

now we will export this data into numpy array.

data = raw_data.get_data()

data will be a 2D numpy array columns as the electrical activity per milisecond and rows will be channels picked in previous step in same order as list provided we can resample the number of columns also by mne.filter.resample(). Once we get the data as numpy array we will subtract mean of all channels with each channel to remove artifacts.

def Artifact_removal(data):
new_data = []
data = raw_data.get_data()
avg_data = data[0] + data[1] + data[2] + data[3] + data[4] +
data[5] + data[6] + data[7] + data[8] + data[9]
for i in range(len(data)):
val = data[i]-avg_data
new_data.append(val)
return new_data

Above code snippet will give us the artifact removed data for the selected channels, now we will extract features from the obtained data features will be in time-domain and frequency-domain features. Time-domain features are mean, median, standard-deviation, variance , maximum, minimum, sum spectral entropy, energy, skewness, kurtosis and store them as column and channels as rows and create a dataframe in pandas and store in a .csv or .json format. After saving the data we can use any machine learning model to proceed further according to the need.

--

--