The Future Is Federated

Nicole
Nicole
Dec 21, 2020 · 6 min read

Balancing the power of machine learning and privacy

The potential applications of machine learning for earlier disease detection was one of the first things that drew my interest to machine learning more broadly. In high school my dad found out he had kidney cancer, but only after a few years of what appeared to be random organ shut downs, and visiting many, many doctors who had no clue what was happening. The doctor who did make the diagnosis had access to a searchable repository that tied a few repeated cases of renal cell carcinoma (RCC) to these seemingly random shut downs, which left me thinking about why he hadn’t been able to access that information sooner.

This drew me into the privacy preserving machine learning space, and when I was 19 I applied to INOVA hospitals accelerator to research potential applications of ML to better understand biomarkers like miRNA concentration for better RCC diagnosis — as I came to understand HIPAA and the broader complex regulatory space medicine was stuck in, I began to research federated learning as well.

In 2017 Google published this research detailing how they use federated learning for the Gboard (Apple shortly followed suit) to effectively train local models on search queries without sending the entirety of users’ personal data back to their servers, which pulled me deeper into the space.

Image for post
Image for post
Image for post
Image for post
A phone personalizing the model locally, based on your usage (A). Many users’ updates are aggregated (B) to form a consensus change © to the shared model, after which the procedure is repeated.

In short: federated learning is one of many types of privacy-preserving machine learning- this approach specifically enables the users’ personal data to stay on their device (or, in other use cases, enables data to stay on servers) while the model is trained locally — only the model update is then sent to the cloud. Federated learning represents the potential of machine learning with the benefits of distributed power (and data ownership) among users. For a broader overview on the variety of types check out this explainer by OpenMined.

Image for post
Image for post

How Then, Shall we Live?

After the 2017 Google paper piqued our interest, we spent the next few years meeting with operators and researchers across data privacy and privacy preserving machine learning broadly. As we began to build our thesis on both the application and timing of the space, we started to see the first wave of Federated Learning companies pop up and seek funding. What we saw however was that many would align with our thesis on the diverse potential customer base for this technology, but eventually would end up in a narrow scope of fraud detection. Not a bad thing, but a sign for us that it’s perhaps still too early for the horizontal opportunity we thought was here.

(For broader notes on the future of compute architecture, see @mhdempsey ‘s “What kills Cloud Computing: A history of time shared computers and one device to rule them all”)

Image for post
Image for post
I like to daydream about the romantic ideals of the information structures of the future by examining the past. This is the ancient library of Alexandria, one of the largest libraries in the world.

Where the Future Lies

We think that because of some of the challenges faced by highly complex internal teams that are typical for government, pharma, or banking, sending forward deployed engineers out for the first year or two to gain internal understanding of these teams and accelerate product market fit holds a lot of potential (not dissimilar to the way Palantir approached working with the government). The administrative burden of having an appropriately dedicated engineer thinking about how to architect your application of federated learning means that the industries that are best suited for FL have a sufficiently high regulatory or other privacy related burden that means they’re both economically and structurally motivated to spend time implementing FL. We see finance, pharma, and government likely being the first movers in this space, with a long tail of possibilities across healthcare broadly and other industries.

Currently Doc.AI and Owkin are using FL with the intention of implementing cross device FL for medical research, Intel focused in on the FL for medical imaging space specifically. This piece lays out a simple framework for federated learning on vessel segmentation, if you want to try it out for yourself! This EU funded paper and research details the potential of FL for drug discovery virtualization.

Musketeer is pushing forward use cases in smart manufacturing and medical use cases. Nvidia Clara is a reference application for distributed AI training that’s designed to run on Nvidia’s recently announced EGX intelligent edge computing platform. FedAI, Devron, Decentriq, and Datafleets are also all generally focused on developing general enterprise federated learning platforms and frameworks.

Constraints, Challenges, and Open Questions

  • What unique challenges do the constraints of the devices the model is trained on present? With cross device federated learning, the devices that are gathering the data must be able to to train a model — there are also unique challenges around the various fidelities of data a variety of devices might collect, and the speed at which they all train the model so that they deploy the update to cloud simultaneously if necessary.**
  • What new, and likely under-researched security risks do FL systems represent? One of the open questions in this space is the potential to reverse engineer details about secure personal data from the high level overview that is sent to the cloud. A sybil attack, for example, represents some risk for FL. We’ll continue to follow along as the security research progresses in this space.
  • What level of parallel computing is possible? Current algorithms only work with device numbers in the 100s, hopefully this number will progress as algorithms do
  • How do we deal with non-IID data? The traditional statistical assumptions made with many ML models(ie that the data is independently and identically distributed) aren’t always ideal for federated learning and so how we account for or apply this to the ideal use cases is something we’re still thinking about. (This piece by DataFleets (a privacy preserving data engine) gives a great illustrative example of nonIID data if you’re not familiar). Edgify, for example, has proposed federated curvature, which adds a penalty term to the loss function, compelling all local models to converge to a shared optimum.
  • What more is possible for federated computing, outside of just machine learning?

It’s clear that privacy preserving ML, and federated learning especially, are a core part of the future we believe in — and we’re excited to play a part in it.

As always — feel free to tweet or message me questions, thoughts, disagreements, or pitches on twitter or at nicolewilliams@compound.vc

Appendix

** There’s a movement to better understand the tradeoff between communication costs since end-user internet connections typically operate at lower rates (Yuchen Zhang, John Duchi, Micheal I. Jordan, and Martin J. Wainwright. Information-theoretic lower bounds for distributed statistical estimation with communication constraints. In Advances in Neural Information Processing Systems, pages 2328–2336, 2013)

The Startup

Medium's largest active publication, followed by +771K people. Follow to join our community.

Nicole

Written by

Nicole

Design + Machine Learning + Philosophy + ☧. Investor @ Compound. linktr.ee/nwilliams030

The Startup

Medium's largest active publication, followed by +771K people. Follow to join our community.

Nicole

Written by

Nicole

Design + Machine Learning + Philosophy + ☧. Investor @ Compound. linktr.ee/nwilliams030

The Startup

Medium's largest active publication, followed by +771K people. Follow to join our community.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store