BCI —The Analytics Layer

10 min readApr 29, 2023

This article follows up on the description of the hardware layer and presents my take on the challenges and opportunities associated with processing biosignals.

The analytics layer is the step between collecting data and doing something useful with it. Concretely, it consists in transforming raw biosignals into interpretable metrics.

There is a sizeable amount of knowledge available on how to process biosignals. The basics are well within reach of beginners and intermediate developers. This article aims at providing the anchor points from which anyone can build up a data processing pipeline and extract valuable metrics, but you should look up scientific literature as there is no need to reinvent de wheel.

Content of the article

General Concepts of Biometric Data Processing
Feature Extraction
Index Production
References

General Concepts of Biometric Data Processing

The data processing pipeline

There are multiple good ways to build a data processing pipeline, but I need to establish a common convention, consequently, here’s how I organize my data processing pipeline.

There are two main steps, the first one consists of feature extraction and the second step consists in transforming the features into interpretable metrics (which I often refer to as indexes).

**Two stage pipeline, transforming raw data into features, which are in turn used to compute indexes**

Here’s the breakdown:

Raw data often comes directly from the sensor. It can be collected as a continuous stream — online processing — or can be read from a saved file — offline processing. The raw data is generally noisy, sampled at a high frequency, and unlikely to be meaningful in this form.
Feature extraction is the stage that tries to extract as much useful information as possible from the raw data while applying a generalist approach. It filters the raw signal, detects and rejects anomalies, and organizes the data by applying analytical and statistical transformations. It can also involve some level of pattern recognition. The objective is to prepare the data for the next stage.
Index production consists in taking the features and transforming them into interpretable metrics ready to be used by the application layer. Indexes are the end product of the pipeline. They can take the form of predicted labels, such as in the case of motor imagery classification, or continuous values, such as engagement. In the context of psychophysiology, they will measure psychological states such as emotion intensity, cognitive load, arousal level, and valence.

There are, obviously, many exceptions to this framework, some features, or even raw data, can be considered as indexes, and some indexes can be used as features, but the framework presented serves as a customizable scaffold to develop a data transformation pipeline, adapted to your needs.

Online vs. Offline processing

**Ideally, moving between online and offline operation should be a software switch. Both should feed into the same pipeline.**

One element that will affect how the data is processed is whether the pipeline runs online or offline.

An online pipeline processes the data as it is recorded, and offline processing consists in processing it after it was collected. Offline processing outperforms online processing as it can use all the data to interpret a segment (to put it more bluntly, it can use the future to predict the past), but online processing is a requirement for real-time applications.

Here are a few things to consider when developing an online or offline pipeline:

Offline pipelines are often used to prototype online processes. You can simulate an online pipeline on a pre-recorded file. Running online or from a file should be seamless if offline processing is used for prototyping purposes. Having an offline option makes it easier to develop, maintain, and QA your pipeline.
The main advantages of offline processing over online processing are data normalization and artifact rejection. Normalization benefits extensively from having access to the whole history of the recording. Artifacts can take a fair number of samples before being identified as such. Offline processing, has access to the big picture.
Online processing is built around a sliding window. The longer the sliding window, the better the performance of your index extraction (true for stationary variables, only), but the worst is your time resolution. The parameterization of your sliding window will be intimately tied to the requirements of your application, the SNR of your signal, and the stationarity of the indexes.
Code optimization and profiling become important when designing real-time applications. If your index frequency is 2 Hz (period of 500ms), each stage of the pipeline needs to run in less than 500ms, otherwise, it will accumulate lag. It is possible to design an asynchronous pipeline, but this should be considered a worst-case scenario.

A final piece of advice. Even if you are developing an offline process, plan for flexibility. It Is only a matter of time before your offline application will need to be adapted to a real-time scenario.

Feature extraction

The raw data is sampled as is, but rarely used as such. It needs to be organized and processed in some way to extract meaningful information.

I would love to open the discussion toward information theory and mutual information [1], but this will be for another time.

Basics of signal filtering

Effect of basic filtering of an EEG signal. Remove the DC offset, the linear trend and high frequency noise. *Notice the 4 eye blinks in the sample. Those will be easier to detect, now that the signal has a steady DC component with an average of 0.*

Filtering is used to clean up the signal before advanced processing can take place. It is particularly important if you are implementing time-domain feature extraction. Four types of filters are generally used:

High-pass filters are used to remove DC offset and signal drift. When used for that purpose, you want to use a low cutoff frequency such as 0.1 Hz. By doing this, you effectively subtract the average of the signal and remove any slow drift, due to changes in electrode properties over time for example.
Low-pass filters are used to remove high-frequency noise, which is sampled by the system, but outside of the frequency band of interest. The frequency cutoff will depend on the properties of the signal and the quality of the sampling. For an EEG system, with dry electrodes, you don’t expect to capture much above 40Hz. With a research grade system, using wet electrodes, you can reject from a higher frequency.
Stop band is used to remove the main hum (50Hz or 60Hz [2]), which is invariably sampled due to the presence of electrical equipment in the environment. The stop band is configured to a narrow band around the hum frequency, with a frequency cutoff of +/- 2Hz.
Band-pass is used to extract information within a specific frequency range. This filter is used to extract EEG power bands [3] for instance. It doesn’t remove only the noise but everything outside of the specified range. This filter is used to get the information, rather than remove the noise.

In addition, there are also adaptive filters [4]. They are part of the machine learning algorithms family. It gets a little bit more complex at this point, but you should take a look at what they are and keep them in mind in case you need them.

Analytical feature extraction

A certain number of features, and even indexes, can be extracted using analytical solutions. This is a reliable way to extract information from a signal and can be processed using parametric modeling (statistical models).

**EEG signal, decomposed into three powerbands** [**”beta”, ”alpha”**, **“theta”**,]**, using bandpass filters.**

EEG Power bands related features. The relationship between EEG powerbands and cognitive processes has been extensively documented. Engagement (sometimes called focus), arousal, and valence [5], can be computed from a ratio of power band powers. Same with motivation, computed from frontal asymmetry [6]. The occipital alpha wave (eye-open, eye-closed experiment) is the easiest biomarker to capture in EEG [7].
EMG RMS. Another example of an easily extractable feature is the RMS of EMG. There are many more valuable analytical features for EMG [8], but RMS is the easiest to get started with.
A common statistical tool is the z-score [9] (it counts as a normalization step). It assumes the data is normally distributed. It is a good multipurpose model, but biosignals have well-documented distributions, which are better suited. For instance, powerband ratios follow a non-central f-distribution [10].

Pattern detection

Many features are represented in the time domain. You can apply pattern recognition algorithms or heuristics to mark them.

**A) SCRs in electrodermal signal. B) Heartbeats in PPG. C) A description of the P300 response (source**)

Here are a few examples:

(A) Skin-Conductive responses (SCRs [11]) are electrodermal (EDA) events that register as indications of an aroused physiological state. These discrete events change the conductance of the skin drastically and can be segmented.
(B) PPG [12] measures blood volume, but the signal needs to be processed to extract heartbeats. A good reference library is pyHeart [13], although it is published under a GPL3 license, so you won’t be allowed to integrate it into a proprietary pipeline.
(C) P300 [14] is a well-described EEG response to an event. It can indicate that the subject recognized a pattern (oddball paradigm).

· Artefacts can also be detected using some form of pattern recognition. Common artifacts are movement artifacts, muscle artifacts (in EEG), eye blinks, and disconnected electrodes.

Finally…

Extracting features requires a good understanding of signal processing, statistical, and biosignal properties. The good news is that there is a lot of literature on the topic and most feature extraction techniques are well-documented, so conduct a proper review of the scientific literature before wasting time on a solved problem.

To implement this stage properly, I suggest you:

Read about biosignal processing and statistical analysis
Use validated libraries such as Scipy [15], Sklearn [16], Pandas [17](Python)
Remember that the simplest solutions are often the best ones. Complex features can deliver a good payoff, but some scientists have a tendency to overcomplicate stuff for minor gains, try simple things first.
Be careful not to over-process your data. If the feature extraction process becomes complex or you don’t know if a feature is useful or not, leave it to machine learning (next step) to sort things out.
Generating too many features can impair machine learning capabilities. There is a concept called the curse of dimensionality [18]. Extracting features is about being sufficient, not overwhelming.
Even if you are focused on designing feature extraction, keep in mind that your task will be much harder if the data collection process is not rigorous. Sometimes the fix is not algorithmic, but methodologic.

Index production

Producing an index means transforming features into interpretable information. Some features can be indexed as well. For instance, SCRs are markers that something significant likely took place.

The collection of indexes we’ve developed at RE-AK. Emotional indexes (extracted from facial expression analysis), arousal (extracted from EDA), engagement & cognitive load (extracted from EEG) and heart-rate. Not shown: frontal asymmetry (EEG) and short-term HRV (PPG).

Statistical models

Given the progress made in machine learning over the last two decades, we sometimes forget that statistical analysis is a powerful framework to project data onto a readily interpretable scale.

Implementing a null hypothesis rejection test will give you the probability that the data sample is a member of the null hypothesis distribution. The p-value, distributed between 0 and 1, gives you the level of confidence that the data point belongs to the null hypothesis distribution. It can also be thresholded, to make it, effectively a Bayesian classifier.

Statistical models require a fine mathematical understanding of the situation and can be tedious to develop. The benefit, however, is that statistical models are faster to train, quite robust, and provide strong generalization capabilities.

As far as I’m concerned, developing a statistical model goes through understanding Bayesian statistics, a very enlightening theory [19].

Machine learning models

Nowadays, machine learning offers a straightforward approach to solving many problems, but it will only work when the problem is well-defined. Machine learning can also be preferred over an analytical solution if we need to map the index along a standard scale or account for inter-individual variability.

A great example is a machine learning-based index is the metric of cognitive load. Cognitive load is correlated with several powerbands, positively and/or negatively. Bringing them together under an analytical solution is hazardous. In contrast, developing a classifier or a regression model, over data collected during a control task can give you a subject-invariant metric, scaled between 0 and 1.

Equivalently, machine learning can be used to combine different features, when an analytical solution is not available. Facial expression classification, using fEMG, make use of a large set of features of different nature. The machine learning approach makes it possible to bring them together independently of their respective parametric description.

The main issue with machine learning is that it often turns into a black box. It solves the problem but how? I suggest you always combine your machine learning development with a data mining approach. As a human in the system, you don’t need to understand everything or beat the machine learning results, but take a moment to get the vibe of your data and develop a rough understanding of which features the machine learning algorithm uses.

Be wary of issues such as overfitting. The best way to get around most issues is to use cross-validation, but even then, be careful. I’ve seen people running cross-validation processes in which adjacent samples (same subject, adjacent in time) found their way in both the training and the validation set. If you are training for subject invariance, a member of the validation set should not be part of the training set and vice-versa.

Overall, the quality of the model depends on:

The size of the database;
The quality of the control task/data labeling;
The quality of the data collected;
The set of features used;
The machine learning model and how it is trained.

Machine learning can save you a lot of trouble, but shouldn’t be trusted blindly.

References

[1] https://en.wikipedia.org/wiki/Mutual_information

[2] https://en.wikipedia.org/wiki/Electroencephalography#Comparison_of_EEG_bands

[3] https://en.wikipedia.org/wiki/Mains_hum

[4] https://en.wikipedia.org/wiki/Adaptive_filter

[5] Moinnereau, M. A., Oliveira, A. A. D., & Falk, T. H. Instrumenting a Virtual Reality Headset for At-Home Gamer Experience Monitoring and Behavioural Assessment. Frontiers in Virtual Reality, 156.

[6] https://imotions.com/blog/learning/best-practice/frontal-asymmetry-101-get-insights-motivation-emotions-eeg/

[7] Hohaia, W., Saurels, B. W., Johnston, A., Yarrow, K., & Arnold, D. H. (2022). Occipital alpha-band brain waves when the eyes are closed are shaped by ongoing visual processes. Scientific Reports, 12(1), 1194.

[8] Phinyomark, A., Quaine, F., Charbonnier, S., Serviere, C., Tarpin-Bernard, F., & Laurillau, Y. (2013). EMG feature evaluation for improving myoelectric pattern recognition robustness. Expert Systems with applications, 40(12), 4832–4840.

[9] https://en.wikipedia.org/wiki/Standard_score

[10] https://en.wikipedia.org/wiki/Noncentral_F-distribution

[11] https://python-heart-rate-analysis-toolkit.readthedocs.io/en/latest/

[12] https://imotions.com/blog/learning/best-practice/skin-conductance-response/

[13] https://python-heart-rate-analysis-toolkit.readthedocs.io/en/latest/

[14] https://en.wikipedia.org/wiki/Photoplethysmogram

[15] https://scipy.org/

[16] https://scikit-learn.org/stable/

[17] https://pandas.pydata.org/

[18] https://en.wikipedia.org/wiki/Curse_of_dimensionality

[19] https://en.wikipedia.org/wiki/Bayesian_statistics