AI Asks: Does what Happen in Vegas Really Stay in Vegas?
Deep Learning examines the brain waves of alcoholics to find out
There are diverse psychological, familial and environmental factors that contribute to alcoholism’s many forms. Some are often wrapped up in a chicken vs the egg problem . Genes and brain wave patterns that are more prevalent in alcoholics are also present in normal functioning family members. Alcoholics also function at a wide range of levels in society, so it is not always clear.
But with that being said, there are many studies that show clear differences in brain wave patterns of alcoholics. Can these give away an alcoholic, even when a hallmark of the disease is denial?
When millions of neurons fire, they create electrical impulses that are able to be measured by electrodes placed on the head. The general types of brain waves can be broken down into several frequency bands. If you are interested, more information on the types of brain waves can be found here.
Deep Learning: An Overview of the Data Set
To seek answers to this question, the data set used was the the EEG-Alcohol data set found on Kaggle. It consisted of 885MB of raw EEG signal collected in roughly 800 one second trials from 16 subjects (8 alcoholic, 8 control). Each trial had signal information from 64 channels sampling at 256htz.
Each trial was labeled as being from an alcoholic or control subject. Also included were three more experiment conditions related to how subjects respond to matching and non matching visual stimuli. These sub labels were grouped into Alcoholic and Control classes for simplicity and lack of specific domain knowledge. If you are in the field of EEGs and know of a better way, I invite your feedback as someone looking to improve and learn.
The key consideration of the problem was to extract the correct features from the raw signal. As compared to a simple sine wave, the signal below is complex because of its origins being the human mind.
Plotting the Raw Signal (Time/Power)
This is the raw signal that comes in the dataset. Although visually appealing, we cannot use it right away. In this graph, the lower frequencies dominate the higher frequencies that contain the useful information. These high frequencies that we need, appear as noise in this signal.
Fourier Transform (Frequency/Power)
The most basic method to extract the frequency information using a Fourier transform. With this, the x axis of this graph shows the frequencies from 0 to 50 Hertz, and the Y axis shows the power of each frequency.
Using this frequency information, we can model the brain waves by measuring the power of each of the frequencies present. This will show clear differences between the brain waves, especially in the beta band. In these graphs, which have the 64 electrode readings mapped out onto a circle, yellow corresponds to more activity in the selected wavelengths, while blue represents less activity. The images were created by taking the mean of a stacked tensor.
These frequencies show definite differences between the groups, in ways that agree with the scientific literature on the subject. The alcoholic group have more activity in the beta 3 (high frequency) band, than the control group which have more in the beta 1 and beta 2 bands.
However promising, when these images were used in FastAI with xresnet18 architecture as the neural network, the results came out to be around 65–70% accuracy. Below are the results.
Here, the accuracy could not be improved beyond 70%, likely because there wasn’t enough information that was being fed to the model. There were only 800 32x32 RGB pixel which is a tiny distillation of the information that was originally buried in the many megabytes of raw signal.
This confusion matrix shows where the model was incorrect or made a low confidence guess. It shows how often the model confused the control group with alcoholics, which is not a good sign. It also had plenty of correct guesses that it wasn’t very confident about either. What is needed is a better model.
Wavelet Transform (Time/Frequency/Power)
Although, from the previous method, we learn about the frequencies present in the signal, we lose all the detailed information about the time that we had in the first graph. To obtain better accuracy, we need this information because if we know at what time specific frequencies are present in the signal.
This is very critical information because much research shows that alcoholics have lower quality and less stable brain waves than the normal population. If we only know frequencies present, it only shows part of the picture. Wavelets will help us in this respect. These can better model brain waves than the Fourier transform used in the previous attempt. Here is a representation of the raw signal used above, plotted out as a wavelet.
In this image, the high frequencies can be seen oscillating at the top of the graphs. At the bottom of the graph, the low frequencies can be seen with a slower time period since they are not oscillating as quickly. All of this information was lost in just one single pixel of the circular plots above.
Hierarchical Clustering: How to Choose Channels
Now, going from having too little data, I now had a problem of having too much data. Worse yet, much of this data is highly correlated because the 64 sensors are placed very closely on the subject’s head. This problem can be dealt with in several ways.
The easiest to code would be to randomly select nine channels. Another would be to correlate the channels of the alcoholic and control group to see which are most dissimilar and focus on those electrodes. However, this would compromise the integrity of the data and would not necessarily be repeatable on data it hasn’t seen before.
The two options I ended up pursuing were to use hierarchical clustering and selecting specific electrodes that scientific studies show hold this information. The two attempts lead to similar results which will be described in detail later.
Hierarchical clustering involves evaluating the similarity between data, and organizing them into clusters based on how close they are. This would be useful tool for me to cut down on the amount of highly correlated data that I now had.
The method used here involved measuring the distance between the waves in each sample, summing the distances between each point along the time series, and then grouping the 64 channels, according to how far apart they are from each other. With this method, I could choose samples of each cluster which would create a broader picture of what was going on. This showed promise, as shown below. Each of the clusters had similar graphs that could be used.
Seeing this, I was content enough to use this as a broader picture of what was going on in the person’s brain, although it had the downside of losing key information on narrow areas while over-representing dissimilar channels. The former bias was accounted for by the focused approach I attempted later, while the later is something I would be interested in improving upon.
With this, nine channels would be put into a collage which would become one entry into what would become a new data set of extracted features represented not as 1D raw signal values, but as an image that could be used in an deep learning image classifier.
Training the Model:
For the final step of training the model, I now had a new data set that consisted of:
388 Images (collages of 9 channels = 3,592 Channels)
384 Images (collages of 9 channels = 3,456 Channels)
These were put into a dataloader, and resized down to 225x225 tensors for ease of GPU processing. I experimented with the resnet18, resnet34, and resnet50 models to evaluate the performance of each.
Immediately, it was shown that this new data set had much more prominent features than the previous attempts.
Interpreting the results
With this method, the error rate went down to roughly 25% compared to where it was from the previous attempt. It shows how having more detailed information about the signal that could be included as features proved useful to improving the model.
Here, the confusion matrix shows better results than the previous attempt, but there is still room for improvement. While the hierarchical clustering served to try to provide a wider picture of a subject’ brain waves, the method used did sacrifice consistency on the location of the sensors.
Yet, as visualized above, it still confuses alcoholics and the control group at a similar rate. There still are also plenty of low confidence correct guesses even though the model got a higher accuracy than before.
Attempt 2: (Focusing on Frontal, Occipital, and Temporal lobes)
In an attempt to retain the location information, and keep it in consistent location in each of the new collage images, nine sensors were chosen from each sample. These were chosen to focus more on the areas that had differences show up in scientific studies. Here, the data was more correlated due to the sensors chosen being placed closer together, but will provide greater accuracy. Below are the results of the classifier.
This attempt led to a similar result, when measured by error rate, to the previous attempt. The model’s error rate can be seen with the 5th epoch in the “error_rate” column.
Although the error rate was similar, this model was able to make prediction with a higher level of certainty than the previous attempt.
By comparing the two graphs, one can see that the number of low confidence correct guesses decreased by roughly 30%. The model was also much better in confusing the control group with the alcoholic group with a 60 percent improvement from the previous attempt.
This exercise shows that location is a critical feature for the model to account for because it was able to make more confident predictions than without it.
This exercise shows how deep learning is improved greatly by feeding it relevant features. Raw signal will produce poor results because the useful information is buried within it and needs to be extracted.
This processing of the raw data can continue on, and I am sure that the model’s accuracy could be improved even further by adding more steps in the signal processing portion that proceeded beyond my level of competence.
If you are interested in the code used in this project, I invite you to take a look at my Kaggle notebook: Does what Happen in Vegas, Really Stay in Vegas?
A low p-value don’t equal perfect accuracy
As said earlier, alcoholism is a complex disease. Although it is possible to show with statistics that the two groups differ, I was unable to find a feature that a deep learning algorithm could use as the silver bullet to make this determination.