Approximately 50 million people worldwide have epilepsy, making it one of the most common neurological diseases globally. In one of our recent projects for the Ghent University Hospital, we had to tackle the problem of detecting epileptic seizures in EEG signal recordings of rats and mice.
In almost all existing machine learning projects and papers on EEG seizure detection, the aim is solving the challenge of classifying short segments of EEG signals (e.g. this Kaggle competition). In our project, we go one step further: every sample of the EEG signal needs to be classified, resulting in the exact start and end times of seizures. These annotations provide valuable insights and can be used to measure the effectiveness of a medical treatment.
The WaveNet Architecture
Wavenet is a deep learning architecture developed by Google DeepMind in 2015. It is, until this moment, the go-to architecture for text-to-speech creation. By using such a deep learning model, the created speech sounds far more natural than speech created by a parametric TTS model. I recommend having a look at (and a listen to) the examples. Important to notice is that by using their proposed techniques, it is possible to learn from raw signal data in a fast manner.
The model is fully probabilistic and autoregressive, with the predictive distribution for each audio sample conditioned on all previous ones; nonetheless we show that it can be efficiently trained on data with tens of thousands of samples per second of audio.
The magic ingredient to make this possible is the use of dilated convolutions. By increasing the dilation factor of each convolutional layer, its receptive field can grow exponentially. Blocks consisting of various layers are then stacked and combined to form the output. This way, it is possible to construct speech signals one sample at a time.
Instead of using the WaveNet structure in a causal way, we use an adopted architecture to output one prediction sample for every input sample in one run. This output is scaled using the Softmax function between 0 and 1 (inter-seizure vs in-seizure samples). Our final model uses 3 or 4 EEG input channels and consists of 70 convolutional layers. This is the architecture we used:
Postprocessing using an adopted Hidden Markov Model
The output of the Wavenet model can be interpreted as the probabillity of “the animal having a seizure at time t”. Converting this output to actual seizure start and end times, can be done in numerous ways, e.g. by fixing a simple threshold. However, such a simple approach will generate lots of small false-positive peaks which generate incorrect seizure-timings.
Our proposed solution uses an adaptation of the Hidden Markov Model as a probabilistic postprocessing engine. The model consists of only two states, namely the ictal and interictal phase (i.e. in-seizure and between-seizure). The emissions of the HMM are the output samples of the WaveNet model. All transition and emission probabilities can be estimated after training the model using a testset. Intuitively this means that for an animal with a high seizure ratio, the transition probability going from interictal to ictal will be high.
Given the emission samples of an HMM model, it is possible to calculate the most probable state sequence using the Viterbi algorithm. After calculating all transition and emission probabilities on test data, this algorithm makes it possible to compute the start and end times of seizures given the output of the trained WaveNet.
The need for consistent training data
Reviewing the output of the model, it became clear that defining what exactly defines an epileptic seizures needs to be standardized and agreed upon between the observers. This includes a definition of exact start and end times and minimum seizure length. By reducing this inter-observer variance, the model can learn from consistently annotated data. The use of the HMM as postprocessing step makes it possible to add extra parameters like minimum seizure length by conditioning the transition probabilities.