Brain 2 Image: Reverse Engineering Visual Perception from Brain Signals

Adolfo Ramirez-Aristizabal
Labs Notebook
Published in
9 min readDec 6, 2023
1 second of brain signal from Emotiv EEG headset while participant looked at a Goldfish

Advances in Generative AI algorithms and Augmented Reality hardware have sparked interest in the technological possibilities of precision Bio-Sensing. If you are a user of a smart watch, then you have already started to experience how current versions of this technology can shape your everyday life. These products collect bio-signals from users, which helps customize the services and applications you interact with. As the sensing capabilities of these technologies advance, the view into understanding users will become clearer and more complete.

Recent medical research studies have challenged the art-of-the-possible and shown that we can reverse engineer what people see by processing their brain signals. Such results provide confidence in a future where technology will be able to even leverage our neural signals to predict our intentions and interpret how we perceive the world. Furthermore, medical researchers posit that this would be of benefit to patients with limited motor capabilities. So far, such results have been limited to medical and experimental lab scenarios, leaving it an open question about the feasibility with consumer grade neurotech.

Here we present exploratory results conducted in Accenture Labs, showing evidence of how consumer grade neurotech can be leveraged alongside a simple GenAI pipeline.

Supervised learning method for building custom neural decoding model. First 2 seconds of person watching an image is used to train a custom CNN model, and the last second is used to validate model predictions. The R&D results by Accenture Labs

R&D efforts at Accenture Labs work towards bringing the latest in technological research to applied use cases for client solutions, bringing forth foresights on technological development 5–10 years ahead of current market relevance. Some of the related use cases involve improving the training & upskilling of workers who perform physical tasks, augmenting the product design process, and innovation of assistive systems.
Check out the previous communications around Audio AR and Neuroscience-as-a-Service.

Brain to Images

Exciting news headlines have been highlighting the neural decoding work done by medical researchers, provoking our imagination of a future where precision bio-sensing tech can read our minds and translate our thoughts. Much of this new excitement comes from these studies showing that you can use GenAI to decode our brain signals, in which the novelty of the continuous growth of GenAI capabilities guides the spotlight to differentiating domains such as Neuroscience.
But this work is not completely new, and the first instance of using generative models to reconstruct images from brain signals can be traced back to 2017. Researchers from the University of Catania and the University of Central Florida demonstrated that it was possible to develop AI that can process brain signals and predict what images someone was looking at.

Kavasidis et al., (2017), the first paper using generative deep learning models to decode visual perception

This became a central proof-of-concept for how AI could begin to decode our visual perception from our neural signals. Specifically, it showed how generative models could learn to map our brain signals into image categories. Meaning that given a brain signal as input, it could generate a noisy image reflecting the category of the image that person saw. For example, if you looked at a specific panda during data collection, then the models would predict that you saw a panda and generate a panda image resembling variations of all possible pandas it was trained on plus some noise from the neural input.

Takagi & Nishimoto (2023), results of using Stable Diffusion to decode visual perception.

Fast forward to 2022–2023 during an AI boom caused by novel GenAI models, and researchers raise the bar on the state-of-the-art of neural decoding. An exemplary study by researchers from Osaka University showed how they could process fMRI signals and use the generative power of Stable Diffusion to produce images that were semantically related to the original image that participants were exposed to.
Their method mapped the features of brain signals to both the encoding of the original image and of the image captions via linear models. These models outputted abstract latent features that were injected into the middle layers of the Stable Diffusion autoencoder as input, which resulted in a generated image as the final result.

Benchetrit, Banville, & King (2023), results from models in a novel generative pipeline.

Several other studies have followed since then including one by researchers from Paris Science et Lettres University and Meta’s Fundamental AI Research (FAIR) group, showing a novel GenAI pipeline decoding MEG brain signals. They frame this as an image retrieval process, where novel GenAI models hold a deep memory of images from the internet. Then, brain signals act as input to index a specific space of that rich latent space to retrieve an image as similar to the original image that a person saw. Lastly, they frame their results as bringing insights to what factors would need to be considered for a real-time implementation of this technology, making the future directions of their work relevant to industries investing in bringing the tech to life.

Limitations

Towards Real-Time
It is important to note that the research in decoding people’s visual perception currently serves as proof-of-concepts for the art-of-the-possible. So far, there hasn’t been implementations showing that real-time brain signals can be actively decoded to reconstruct our visual perception. Rather, these studies focus on collecting a dataset of brain signals, and then use that data to train and develop AI that can show impressive results from those specific datasets. The above-mentioned research by Meta stands out by thinking about what factors would be needed to bring decoding to occur in real-time. Outlining the importance of neural signals with higher temporal resolution and methods of optimizing inference for better outputs.

Example of Magneto Encephalogram data collection.

What happens in the lab stays in the lab?
All the studies that reached high-profile media coverage depended on medical and research grade laboratory conditions. Focusing the use cases for these implementations to specialized medical contexts, where the collection of MEG & fMRI would necessitate the use of expensive, furniture-sized magnets. Furthermore, data collection conditions are not very naturalistic and often require carefully instructing patients to have their bodies be very still while the magnets scan them.

Like with any Machine Learning application, brain sensing faces issues with data scarcity, that often slows the scaling up to real-time and new populations. Such limitations can be alleviated through implementation of synthetic data, which Accenture has worked in various domains to help clients solve such data issues.

Results

Towards Consumer Products
Hardware leveraging the measuring of electrical activity with EEG headsets and blood-flow through IR sensors such as fNIRS have made big strides towards capabilities for in-the-wild user data collection. Which, Accenture Labs has partnered with exemplary vendors such as Mendi and Wearable Sensing. Where the neurotech from Wearable Sensing has focused on providing research grade precision while still being robust to natural user movement. On the other hand, Mendi has focused on starting to build a platform that end users can easily wear and engage with through a gamified phone application.

Publicly available dataset provided by David Vivancos.

Data
For our analysis here, we leverage a publicly available dataset MindBigData — “IMAGENET” of the Brain where the data collection uses a commercial grade EEG headset Emotiv, which has a more approachable price point (~$500) compared to research grade neurotech. The data set consists of 3 second brain signals while viewing an image, for a total of 14,012 images.
For our purposes, we use a subset and focus on brain signals corresponding to 4 image classes, with 2 images per image class [Rooster, Ostrich, Goldfish, Stingray].

Methods
Our goal here is to lean into simplicity, so we stick to a supervised learning method for training a small Convolutional Neural Net (CNN) for processing the brain signals. Then the outputs of the CNNs will be used as prompts to a GenAI architecture like a Stable Diffusion model.

Proposed pipeline, where neural decoding model processes brain signal to generate image or text as prompts to GenAI model.

Previous implementations have worked towards sophisticated and complicated signal processing steps, which also work by opening up or fine-tuning GenAI models. Here we avoid doing this to preserve the rich memory of the LLM, not complicate the processing steps, and to reduce the overhead of LLM fine-tuning.

Image category classification results of CNN decoder.

CNN Results
We performed two analyses; the first trained a CNN decoder as predicting the image class from a brain signal as input and the second analysis trained a CNN to reconstruct the specific image someone saw. For all training we did a 66% Training — 33% Test split across time, where the first 2 seconds was used for training the last second of unseen brain recording was used for evaluation.
Our CNN classifier showed 75% accuracy with up to 87.50% precision, with a baseline of having a 25% chance of randomly being correct. The CNN model was computationally efficient as it was only 15K parameters big.

Reconstructions from trained CNN model. Results show that model learned to discriminate patterns of brain activity to possible pixel combinations of all seen examples.

Our CNN image reconstruction results are presented above and were evaluated on a 3 (Good, Ok, Bad) subjective measure scale. With the strongest examples coming from both roosters, one stingray, and one ostrich image. Simply put, the amount of reconstruction quality roughly follows our classification accuracy, with some reconstructions showing confusion as it overlays other images. Such a concept is not too dissimilar to what happens when LLMs hallucinate. The CNN model was only 18 million parameters big.

Output Labels as Prompts
Part of our exploratory pipeline is to use CNN outputs as prompts, which has been done in other studies including a brain to music paper, where their neural decoder output becomes the input to an Image-to-Music model like Riffusion. Here we use our predicted labels from our classifier to perform Text-to-Image with Stable Diffusion.

CNN predicted labels as prompts for text-to-image with Stable Diffusion

Output Images as Prompts
Using the CNN image reconstructor, we took the validation outputs and passed them to Stable Diffusion for an Image-to-Image process. Models like Stable Diffusion allow for customizing generation with hyperparameters and specific pipelines. People use this for the purposes of making specific edits or having specific constraints to what it can output. Here we decided to simplify this hyperparameter optimization process and stick to an Image-to-Image (Variation) pipeline. This takes images as inputs and generates variations of those images.

CNN reconstruction outputs as prompts for Image-to-Image (Variation) with Stable Diffusion

What We Learned

We showed that we can do brain to image neural decoding using data from commercially accessible hardware.

We showed that we don’t always need LLMs. Our custom models gave good classification and some impressive reconstructions. In reconstructions that were harder to interpret, the Stable Diffusion outputs helped with interpretability, by blending the models confusion into one image.

We know that LLMs come with a big overhead, and investments for building GenAI solutions try to mitigate development cost trade-offs. Therefore, our proposed pipeline tries to simplify the use of LLMs with a small, custom CNN focusing on the specificity of brain signals and GenAI focusing on it’s strength of generalization.

Looking forward, the promise of this research is in line with medical use cases for helping interpret the intentions and communications of patients with limited motor capabilities.

Outside of specific lab settings, such capabilities can be built onto smart wearables, opening up more precision in the consumer health & wellness market.
Industry leaders such as Apple have shown a focus with their “Health data shouldn’t be public” marketing. In the near future, foundation models for bio-sensing will also help scale up to more users and to everyday scenarios.
We can imagine a future where our smart watches and smart ear phones can sense not just our outside world, or just the users, but also how users perceive the outside world.
Therefore, precision bio-sensing will be instrumental in grounding the power of GenAI solutions to personalize user interactions with technology and services.

--

--

Adolfo Ramirez-Aristizabal
Labs Notebook

Associate Principal Researcher at Accenture Labs — Digital Experiences