Data Privacy Poses Ethical Challenges For Medical AI. Can Federated Learning Overcome Them?

Luke Smith
Nanotrends
Published in
3 min readJul 10, 2020

Healthcare AI is a big focus for investors (see the market map below from CB Insights), with investment totalling $4bn across 367 deals in 2019, and has huge potential given the enormous (and growing) spend on healthcare. However, the impact of AI on the clinical setting hasn’t matched the hype and one of the key reasons is the difficulty of gathering enough training data.

Source: CB Insights

The challenge of getting training data — the initial set of data that will be used to train the algorithm — is common to lots of ML applications but in healthcare the difficulty is greater due to the need to protect patient privacy. One technology that has potential to address this challenge is federated learning, where an algorithm is trained across multiple decentralised data samples rather than collecting the training data in one server.

Source: NVIDIA

The Opportunity

The rapid analysis of the different factors that influence mortality rate for Covid is an example of the power of bringing the algorithm to the data. The study covered 17m patients records including more than 5,000 covid-linked deaths and was able to go from idea to publication in seven weeks because the data was analysed on the stored electronic health records rather than being transferred for analysis.

Analysing data in situ simplifies the challenge of complying with data privacy regulations and should speed up the painful process of getting data from clinical organisations it could also avoid the kind of backlash seen against Google Deepmind’s partnership with the NHS by allowing patient data to stay with the NHS to avoid the risk of misuse or breaches.

By making it easier to train algorithms on data from multiple organisations, federated learning can also avoid the problem where algorithms trained on data from one site fail when used outside the training environment, like the pneumonia diagnosis algorithm in New York, that failed when presented with data from hospitals it didn’t know. Hopefully in the future, predictive algorithms will be trained across multiple sites as standard.

Unsurprisingly, federated learning has started to attract attention within the healthcare space with Nvidia releasing their Clara framework for federated learning in 2019 to allow developers to build AI-powered imaging and genomics applications on distributed data. In addition, Owkin, a New York based startup that has raised $56m in funding from investors including GV, F-Prime Capital and Bpifrance, provides ‘Loop’, a distributed patient data sourced from a network of academic medical centres as well as a federated learning platform to train algorithms on the distributed data.

With the emergence of frameworks like Clara, as well as more generalised frameworks like Tensorflow Federated, the barriers to build applications built on federated learning are decreasing and, with concerns around data privacy only likely to increase, I expect federated learning to become an important approach to delivering ML within healthcare. I’d be particularly interested in the genomics space since genetic data is inherently personal and raises specific concerns around data sharing. In addition, the sheer size of genetic data makes moving it inefficient.

Another area of interest is around consumer digital health apps and I’ve written before about the potential for ML powered chronic disease management apps. Federated learning could improve these apps by allowing models to be trained with data on the device instead of sending sensitive personal data to the cloud. Google does this by using federated learning to improve Google Keyboard. This would reduce the risk of data breaches and address consumer concerns around data privacy.

Obviously, federated learning isn’t a panacea and the challenges of fitting into clinical workflows, making accurate predictions from imperfect data and dealing with bias will remain. However, by reducing the privacy concerns caused by healthcare ML applications, federated learning has the potential to accelerate the uptake of medical ML and ultimately improve healthcare outcomes.

--

--

Luke Smith
Nanotrends

Luke is an investor at Forward Partners with a focus on applied AI, ecommerce and marketplaces