The health data crisis…

Access to huge amounts of data, and the computational power to analyse it, has revolutionised many industries. But health-related innovation lags behind because of restrictions on using the vast wealth of existing data. If we could tap into this data tomorrow, we could vastly accelerate research to cure diseases like Alzheimer’s, Cancer and Diabetes.

A wealth of data, untapped

Let’s estimate the amount of data created, from a set of rather conservative assumptions.

There are over 36,000 MRI scanners in the world, with about a quarter of scans imaging the brain. Let’s assume each scanner works 5 days per week, for 8 hours per day and each patient takes up 1 hour to scan to acquire 1 useful image.

scans = 36,000 machines * 250 days/year * 8 scans/day

This gives us some 72 million scans per year, 18 million of which are brain scans, or at least 50,000 brain scans per calendar day. And this is the lower bound. In reality, the total number of scans is likely to be considerably greater.

Most of these scans are not shared with the scientific community to inform research and medical practice.

A synapse in the right direction

Today, there are several amazing efforts to provide scientists with large brain imaging datasets.

The largest target Alzheimer’s (ADNI), Autism (ABIDE), genomics (GSP), brain connectivity (1000FCP), Parkinson’s (PPMI) and middle-age individuals (UK Biobank).

Beyond this, there are smaller datasets like OASIS, MIRIAD, IXI, Human Connectome Project, ICBM, AIBL, etc.

P.S. If I have accidentally omitted an important brain MRI dataset, please mention in the comments.

Today, these datasets include the brains of less than 50,000 individuals. And, while the research datasets have much richer, higher quality data than a typical clinical dataset, they still represent a small proportion of the MRI brain scans performed.

…at least 50,000 brain scans per calendar day

Don’t get me wrong, these initiatives are commendable and will provide great benefit. But in this era of Big Data, the Data Scientist in me is saddened that there are troves of data that merely sit on the hard disks of hospitals and clinics, when they could be used to refine our understanding of the human brain and the diseases that affect it.

Health & Safety

One of the critical issues is the safety of the medical records. While it is important that the privacy of the individual is kept secure, the problem comes when fear stifles innovation.

Yes we must take all reasonable precautions to ensure privacy. We certainly should and can:

  • anonymise the scan’s “headers”, removing the name and date of birth of the individual, and the day, time and location where they were scanned
  • remove the face from the MRI scan

And yes, there will of course always be a small risk that someone could be identified from a collection of anonymised records. But I argue that after taking reasonable precautions, this risk is far outweighed by the medical benefits that sharing our records can bring.

Diving deep

If you’ve read so far, you may ask, “why should these extra scans make any difference?”

I’m glad you asked. Today we are in the very midst of a machine learning revolution. Until 2012, computers were unable to match us in the very simple task of understanding images from photographs, while today they can even surpass us.

This came about with the advent of effective ways of doing Deep Learning (i.e. neural networks with many layers), which started to see success after some theoretical breakthroughs, but perhaps more vitally, by the availability of enough computing power and huge datasets, numbering in the hundreds of thousands to millions. This had allowed machines to learn to do many useful things with photographs, from recognising items in images, through decent speech recognition (Hey Siri!) to much better machine translation. And we are only just getting started.

We are now at a turning point for applying Deep Learning effectively to medical imaging. We have the compute, but we are not yet leveraging the wealth of data we have acquired.

We believe that we can make much better use of medical data, which will allow us to beat currently intractable diseases like Alzheimer’s much faster.

If you agree with us, please share this article, as we seek people in the medical field who would also like to better utilise data. And if you disagree, please feel free to comment, we’d love to discuss :-)

Disclosure: Sasha is Co-founder of Avalon AI [Entrepreneur First ’15, Techstars ‘16], a startup aiming to defeat diseases like Alzheimer’s Disease through earlier diagnosis of medical imaging data.