How collaboration is revolutionizing medicine

A quick introduction to the new technologies that are enabling doctors, researchers and data scientists to work together on groundbreaking discoveries.

6 min readJan 23, 2023

The rise of Artificial Intelligence has already had a major impact on our lives. From self-driving cars to image recognition, we don’t often realize how interconnected our lives have become with AI. The most promising use of this tech however lies in medicine, where it has the potential to find solutions for diseases and illnesses previously considered incurable. DeepMind’s AlphaFold is the most famous example of the potential that deep learning holds for applications in biology.

Machine learning in healthcare is made possible today through the analysis of large amounts of data. This data is gathered in many different formats, including electronic health records, clinical trial data, and in emerging modalities such as genomic and RNA data. Compared to healthcare professionals, AI models are particularly effective at analyzing these huge data sets and identifying patterns across different modes that may not be apparent to the human eye.

The identification of these new patterns allows for advancements in a growing field of healthcare known as personalized medicine. This is a domain particularly well suited to AI as it involves creating a unique treatment plan for each patient based on all of their available medical data. In order to be effective; however, these models need to be trained not only on different types of data but also on datasets coming from different sources. This last bit is essential because hospitals don’t often have the exact same tools and techniques. For building AI services that are generalized, it’s important to account for this variability.

Access to data from different sources however can be the biggest challenge, as datasets exist in protected silos and experts in data science are often separated from experts in medicine. This is where collaboration becomes critical.

The Data Problem for AI in Healthcare

As we continue to innovate on the medical equipment used in hospitals, the quality of data we can bring to these AI models will improve simultaneously with our understanding of AI and computing. By putting the right datasets in the right hands, we can expect to see countless groundbreaking discoveries in the decades to come.

To make these collaborative machine learning projects come to life, there needs to be a large chain of communication between doctors, data scientists, biomedical researchers and even IT teams as these complex problems require tons of specific domain expertise. Setting up projects to make research possible is an inherently difficult task that often becomes even more difficult due to data privacy regulations such as GDPR and HIPAA. These are important regulations that limit the movement and use of data as their goal is to ensure that patient data remains protected.

So the question is, how can we overcome the challenge of making data access possible while continuing to protect patient privacy?

Privacy Enhancing Technologies (PETs)

The way forward lies in the use of PETs. A number of solutions have been proposed as researchers recognize the potential of deep learning in healthcare if this obstacle is overcome. I’ll touch on a few of the main technologies that are making collaborative data sharing possible today:

Secure Enclaves: These are hardware-based features that provide an isolated environment to store and process data. Although they’re excellent for safely storing electronic data, the hardware aspect can often be limiting for doing collaborative research across different centers.
Differential Privacy: This is a simpler approach where some noise/randomization is added to datasets to encrypt them. This allows users to draw patterns and knowledge from datasets while still maintaining medical data security, which is why it has already been used in many environments. Notably, Apple uses Differential Privacy to gain insights on how individual users interact with their software. Check out this Python framework if you’d like to learn more about how to do this on your own.
Multi-Party Computation: Not as fun as it sounds, this methodology enables partners to jointly analyze data through the use of cryptography. Users can conduct analysis together but don’t have to reveal exact data points to each other.
Federated Learning: This allows Machine Learning models to be sent to servers where they can train and test on data without having to ever move the data from its original location. See here how this facilitates collaboration much more easily as the data is never moved nor accessed by a person outside of a host organization.

The most successful of these methods has so far been Federated Learning, as there is a growing community of researchers and scientists who have embraced this technology and even put it into practice.

Federated Learning Applications in Healthcare

An important distinction to make is that between Federated Analytics (FA) and Federated Learning (FL). While both follow the same principle of data being hidden and retained on the host server in order to comply with privacy regulations, FA is more oriented towards traditional statistics, where a data scientist attempts to make analytical conclusions about data such as averages and distributions. FL on the other hand deals specifically with the training and testing of Machine Learning algorithms that can then be used to make predictions. This can be extremely helpful to predict the outcome a specific treatment would have on a patient.

Data can remain private while shared across the cloud

There have already been multiple studies that show how Federated Learning enables collaboration in healthcare on a global scale.

Scientists in The Netherlands partnered with counterparts in Taiwan showed one of the earliest examples of a successful project using Federated Learning. They analyzed oval cavity cancer survival and found a remarkable difference in the two sets of data.

The MELLODDY project is a great example of a concrete business case being addressed through FL. In this unique project, pharma companies across Europe that were actually in competition were able to collaborate and leverage each other’s data in order to improve their respective models and speed up research in drug discovery while keeping their data completely private.

This study was also recently released, where 71 different sites were connected in order to conduct cancer research on Glioblastoma. As this is a relatively rare disease, this type of research is only made possible through Federated Learning as there is a need to pull together many different datasets in order to build a model that is generalizable enough to all patients. Centralizing all of these datasets in one place would likely be a legal and logistical nightmare.

Such studies already show how privacy preserving technologies are opening a new horizon for AI in healthcare, but what we see here is only a glimpse of the potential they have for enabling collaborative research in the near future.

Looking Ahead

You might have already heard catchy new phrases such as “Data is the new oil”, but there is definitely some truth to this idea as the most valuable products and services of the years to come will be heavily reliant on this scarce resource. In fact Andrew Ng, one of the biggest names in AI, recently alluded to the possibility that we may actually begin to run out of data. This could bring about an unexpected stagnation in the rapid rise of AI. This means that tools such as Federated Learning, which allow us to get more value from data, will only become more relevant in the years to come.

If this sounds interesting to you, then you’d be happy to know that there are already a few simple ways to get started with Federated Learning. Most of these resources exist in Python, which is now an essential language for data science. Open-source frameworks such as Substra, which was used in the MELLODDY project mentioned above, are relatively easy ways to get started doing Federated Learning on your own.

Even as a beginner on Substra, you can run FL simulations to start playing around with the tool. If you’d actually like to do real-world FL, you can even deploy it into a cloud environment and start doing machine learning on federated data.

Whatever you decide to do, do keep in mind that Privacy Preserving AI is an emerging field with lots of room for growth. Processes and best practices and yet to be defined as the amount of real world projects remain limited. This; however, is where the opportunity lies for people to innovate and be at the forefront of AI in healthcare!