Reviewing the American Epilepsy Society Seizure Prediction Challenge

Kaggle Team
Kaggle Blog
Published in
12 min readJan 26, 2015

In 2014 Kaggle completed two seizure predictions challenges, one co-organized by UPenn, Mayo Clinic and one by the American Epilepsy Society.

Accurate and fast seizure forecasting systems have the potential to help patients with epilepsy lead more normal lives:

  • Seizures that are quickly detected can be aborted earlier by using a responsive neurostimulation device.
  • Larger amounts of EEG data can be analyzed by doctors.
  • Patients can better plan activities when they are notified of an impending seizure.

In this blog post, we talk with the top three teams from the American Epilepsy Society Seizure Prediction Challenge.

“[The winning team’s results] blew the top off previous efforts. Accurate seizure detection and prediction are key to building effective devices to treat epilepsy.” Brian Litt, Professor of neurology and bioengineering at the University of Pennsylvania in ‘A Crowd Of Scientists Finds A Better Way To Predict Seizures “Seizure detection and seizure prediction are two fundamental problems in the field that are poised to take significant advantage of large data computation algorithms and benefit from the concept of sharing data and generating reproducible results.” Dr. Walter J. Koroshetz, director at the NINDS in ‘Predicting epileptic seizures with 82 percent accuracy “Working in different countries, we exchanged ideas via e-mail, and agreed on how to best use our submissions during the final days of the contest.” — 1st place team, QMSDP “My observation is that with the open source tools and learning experience in Kaggle competition, a person can tackle most of the machine learning problems.” — 2nd place team, Jialun “All of us hold a PhD in our respective areas, and we are forming a new multidisciplinary research group in data science, that is a field where we have shared interests.” — 3rd place team, ESAI CEU-UCH

1st place team, QMSDP

Biographies

Dr. Quang Tieng is a Senior Research Officer at the Centre for Advanced Imaging (CAI) at the University of Queensland. One of his research projects is super-resolution in MRI.

Dr. Min Chen is a Postdoctoral Research Fellow at the CAI at the at the University of Queensland. Her research projects focus on temporal lobe epilepsy.

Dr. Simone Bosshard is a Postdoctoral Research Fellow at the CAI at the University of Queensland. One of her research projects involves studying the structural network responsible for generating epileptic discharges.

Drew Abbot is a software engineer at AiLive in California. This company has worked closely together with Nintendo to create software for the Wii video game console.

Phillip Adkins is a mathematician and works at AiLive in California. AiLive uses machine learning to facilitate the development of motion recognition packages.

Background prior to entering challenge

Quang, Min, and Simone all work at The University of Queensland in Australia.

Phillip and Drew work together at AiLive in CA, USA.

Solution Summary (as told by the team)

To begin, note that our team merged together after working the contest independently and combined different approaches and ideas to achieve the final result.

Our winning submission was a weighted average of three separate models: a Generalized Linear Model regression with Lasso or elastic net regularization (via MATLAB’s lassoglm function), a Random Forest (via MATLAB’s TreeBagger implementation), and a bagged set of linear Support Vector Machines (via Python’s scikit-learn toolkit).

Feature selection

For the Lasso GLM model, the features were as follows:

  1. Spectrum and Shannon’s entropy at six frequency bands: delta (0.1–4Hz), theta (4–8Hz), alpha (8–12Hz), beta (12–30Hz), low-gamma (30–70Hz) and high gamma (70–180Hz).
  2. Spectral edge power of 50% power up to 40Hz.
  3. Shannon’s entropy at dyadic frequency bands.
  4. Spectrum correlation across channels at dyadic frequency bands.
  5. Time-series correlation matrix and its eigenvalues.
  6. Fractal dimensions.
  7. Hjorth parameters: activity, mobility and complexity.
  8. Statistical moments: skewness and kurtosis.

For the bagged SVM model, the features involved a kernel PCA decomposition of the below features.

The features for the Random Forest model were also a combination of time- and frequency-domain information, and were chosen as:

  1. Sums of FFT power over hand-picked bands spanning frequencies: f0 (fundamental frequency of FFT), 1Hz, 4Hz, 8Hz, 16Hz, 32Hz, 64Hz, 128Hz and Nyquist. DC was also included, yielding 9 bands per channel.
  2. Time-series correlation matrix.
  3. Time-series variance.

Tools used

MATLAB and Python with Scikit-learn.

Continued advancement

Once the contest was over, we realized that using 10 windows (or, simply the 1-minute window) for all subjects actually yielded a better private LB score than the 12- and 150-window choice for dogs and humans, respectively.

We decided that interpolating the signal by a factor of K before taking the final p-norm was worth trying, and indeed, marginal public LB improvements were achieved after doing so (using cubic spline interpolation). In the end, we decided to use Random Forest models trained on 31/32 overlapped preictal and interictal features to classify 63/64 overlapped test features (yielding 4732 and 4737 samples for each 10-minute segment), and interpolate and p-norm those scores for our final Random Forest model.

Interestingly, as overlap and interpolation increased, the optimal p used in the p-norm seemed to increase as well, and our final choices for K and p ended up being 8 and 23, respectively.

Further reading on the winning solution

To see a detailed description of the solution, together with code, look at this Github Repo.

2nd place team, Jialun He

Biography

Jialun He received a Ph.D from MIT in 2003 and is currently a senior algorithm engineer at Hemedex, Inc.

What was your background prior to entering this challenge?

I am a senior algorithm engineer at Hemedex, Inc in the past ten years, where I work on the development of a monitoring device used to measure real time tissue blood flow level. The device is used in neurosurgery and neurointensive care, as well as organ transplant, reconstructive surgery and oncology. I have an interdiscipline background in mechanical and biomedical engineering where I got my Ph.D degree from MIT in 2003. In recent years my focus is on extracting useful information from patient data recoded by our device. My interests are in big data application, especially in healthcare and wearable device

What made you decide to enter?

Several years ago I worked on a project for seizure detection using cerebral blood flow (CBF) rate recorded by our devices. Seizure is generally detected with EEG data recorded at a frequency ranging from 100 to 1000Hz. Our monitor records CBF at 1Hz. When patient data labelled with seizure came in, I found that the patient is also experienced high fluctuation in CBF. Seizure CBF chart is very similar to seizure EEG chart, at different time scale. The seizure CBF waveform can also be decomposed into various frequency bands similar to EEG’s wave bands. Anyway, the seizure detection project was successful and it has been implemented in an automatic system for analyzing incoming patient data. So when I found out that Kaggle hosted a competition for seizure detection, I would like to see what I could do with EEG data

What preprocessing and supervised learning methods did you use?

This competition is all about feature engineering. The core features are the power spectral band. Other candidates of features are signal correlation between EEG channels and eigenvalue of the correlation matrix. Both in frequency domain and time domain.
Several common classifiers in scikit-learn package have been tested, such as random forest, gradient tree boosting, support vector machine. Most of them had really good CV score for individual subject. But did not get good score in LB. The gaps between CV score and leaderboard (LB) score were very big. One of the reason is that LB score is across all subject. Other possible reason is due to overfitting. My best submissions according to the LB score were based on support vector machine with RBF kernel, which produced better results because of more control in balancing bias and variance

What was your most important insight into the data?

Due to very limited amount of training cases, for example, each patient data only has 3 independent seizure occurrences, it is very important to keep a delicacy balance between bias and variance. With this in mind, I added additional signal processing procedures in feature extraction. I resample the signal from 400Hz in dog and 5000Hz in patient to 100Hz. I split the data into longer window of 50 seconds. I also resample the frequency band of power spectral. Those signal processing procedures all helped reducing overfitting
Another challenge of the competition is that the evaluation matrix is based on AUC cross all subjects. I have tried a cross subject classifier but the score is not good compared to classifiers built on individual subject. My best submission is based on individual classifier. Additional calibration is needed to align predict across subjects.

Were you surprised by any of your insights?

I was surprised by the final shake up of the leader board when the final scores were revealed. Many competitors’ final scores were reduced dramatically due to overfitting. It was not a surprise for me that I was among the group of competitors that had least amount of overfitting. I had to admitted that luck was also a factor in determine who could be the final prize winner.

Which tools did you use?

I use Python and standard packages of numpy, scipy, scikit-learn and matplotlib. I also have a homemade neural network system

What have you taken away from this competition?

Start early is my advice for anyone who wants to enter a Kaggle competition. I started a month before the end of the competition. I was in a rush every day. At the end of the competition I still have some ideals that have not been implemented. My guess is that I need at least two months to explore all the possible ideals and have a chance for good ensemble.

Good old school technique in signal processing helps me a lot in this competition. Domain knowledge in patient monitoring also help me understand the nature of the problem. However, what impressed me most is that folks with limited amount of domain knowledge also did pretty well in the competition. My observation is that with the open source tools and learning experience in Kaggle competition, a person can tackle most of the machine learning problems.

3rd place team, ESAI-CEU-UCH

Biographies

Javier Muñoz-Almaraz: PhD in Mathematics with a dissertation about numerical continuation of periodic orbits. Now, he is interested in optimization problems related with data analysis, dynamical systems in mechanics and neuronal dynamics.

Francisco Zamora-Martínez: PhD in Computational linguistics, application of artificial neural networks to language modeling for handwriting recognition, spoken language understanding and machine translation. He is interested in machine learning, energy efficiency, pattern recognition and data science problems.

Juan Pardo: PhD in Computer Science Engineering. He has been working in several European research projects in different fields. He is director of the department of Physics, Mathematics and Computing at university. Interested in data science. Volunteer at ISACA and PMI organizations.

Paloma Botella-Rocamora: PhD in Mathematics, specialist in Statistics. She has been working in Health research projects a long time. She was visiting researcher last year at Bio-statistics Dpt. at University of Minnesota. Interested in Bayesian statistics in data science.

What was your background prior to entering this challenge?

We are a multidisciplinary research group (ESAI) composed by lecturers at Universidad CEU Cardenal Herrera, in Valencia (Spain).

Paloma Botella-Rocamora and Javi Muñoz-Almaraz are mathematicians, Paloma more focused on statistics and Javi on optimization and dynamical systems. Juan Pardo and Francisco Zamora-Martínez are informatics, Juan more focused on computer engineering and Francisco in computer science.

All of us hold a PhD in our respective areas, and we are forming a new multidisciplinary research group in data science, that is a field where we have shared interests.

We are interested in the application of Bayesian methods, deep learning and optimization methods to solve challenging problems like the proposed in this competition.

What made you decide to enter?

From a technical point of view, we wanted to test the team skills and show that it is possible to produce a competitive system combining ideas from different (but related) research areas. Additionally, we tried to apply deep learning techniques to the challenge, whose benefit to this task remains unclear after the competition results analysis.

On the other hand, the competition has been important to let us to work in a common problem and find the way to speak the same language (members work in different areas). And of course the money, as our budget for research is very very limited due to the economical crisis there is in Spain.

What preprocessing and supervised learning methods did you use?

We tried different preprocessing techniques. First, we started with Fast Fourier Transform(FFT) of the data over 50% overlapped sliding windows with 60 seconds length.

This transformation produced a very large number of features, and a filter bank with 6 filters has been applied to avoid dimensionality problems. This preprocessing was insufficient to achieve the high AUC results of top10 teams.

We played with eigen values of correlation matrices computed over the same sliding windows as FFT. The combination of both features in the same model was also important to improve the system results, but not enough to be competitive.

The high correlation between windows and filters suggests that our models can be improved by removing these correlations in the data, so we decided to apply Principal Component Analysis (PCA) and Independent Component Analysis (ICA) to the FFT output. Both transformations showed similar performance, and the system achieved the top20 of public test leaderboard.

Just to improve a little bit the results, we decided to compute different bunch of statistics over the whole input data, without windowing, and finally combined different models and preprocessing techniques in an ensemble.

Regarding to supervised learning methods, we start trying logistic regression models, expecting that linear models wouldn’t overfit, and they could be used as a nice baseline. However, our surprise was that this simple logistic regression models achieved so high AUC scores in cross-validation (0.93), but in public test data the AUC dropped to very low values (approx. 0.60). We were very confused because of this result, and discussed during the whole competition about why it happened, but we couldn’t realize any clear explanation.

Following logistic regression, we tried K-nearest-neighbors (KNNs), but computing class probabilities instead of distances. The drop between our cross-validation AUC and the public test AUC was reduced by using KNNs, but not so much. Finally, we trained Artificial Neural Networks (ANNs) with different number of layers, using dropout to avoid overfitting and ReLU activation functions. After a hard manual optimization of these ANN models, we obtained our best single model result.

Besides the exploration stated above, the combination of KNNs, ANNs using FFT, PCA, correlations, and other statistics, in an ensemble optimized following Bayesian Model Combination (BMC) was our ticket to be in the top15 in public leaderboard, but 4th place in private leaderboard, and 3rd prize after the winner rejected its first prize.

We found that the ensemble of different knowledge sources was a nice way to ensure good stability between public and private AUC. (See the code in the Github Repo)

What was your most important insight into the data?

We found that all the channels of the EEG were very correlated, and this correlation could harm the supervised statistical learning. The use of PCA or ICA to reduce this correlation is a way to ensure better performance. However, another ways to exploit channels similarity, and to reduce their dimensionality, could be explored.

As it was discussed in the forum, a global model, able to learn from all the available subjects, would be a very important step forward, but this exploration remains in the future work of this task, at least for us.

Were you surprised by any of your insights?

We were surprised about the logistic regression behavior using our features, the large drop between cross-validation and public test AUC was very disturbing. It has complicated the internal comparison between our different approaches.

Which tools did you use?

We used two main tools, R for statistical preprocessing and APRIL-ANN for FFT and supervised learning. This last tool is a brand new development where members of the research team are involved.

What have you taken away from this competition?

We realize that it is very important to stabilize the system results by using ensembles, and that ensembles of different preprocessing pipelines can be even better. Following this methodology, it is easy to share knowledge and skills in multidisciplinary teams, and it resulted in a way to improve the system results to be in the top10.

Further Reading & Resources

Originally published at blog.kaggle.com on January 26, 2015.

--

--

Kaggle Team
Kaggle Blog

Official authors of Kaggle winner’s interviews + more! Kaggle is the world’s largest community of data scientists. Join us at kaggle.com.