Secret Machine Learning: How to Train at the Edge

Ryan McCoy
Automated Inspections
4 min readJan 15, 2024

We are in the age of IoT where data is collected worldwide, yet bandwidth and power consumption limit complex calculations.

Traditional machine-learning approaches often require centralizing massive datasets, raising anxieties about data security and ownership. Federated learning (FL) emerged as a powerful solution, enabling collaborative model training across multiple devices without ever sharing raw data. This paradigm resonates particularly with the rise of edge computing, where decentralized processing on devices like smartphones and IoT sensors unlocks the potential for personalized AI experiences while preserving individual privacy.

Enter PySyft and TensorFlow Federated (TFF), two leading frameworks facilitating federated learning on the edge.

Building Models Without Sharing Secrets: Architectural Foundations

PySyft, built on Python and PyTorch, adopts a secure multi-party computation (MPC) approach (In other words, sharing the load across multiple computers instead of a single point of failure).

The key strength of Pysyft is its ability to perform high-speed computations on encrypted data stored locally. Researching on your own, terms like homomorphic encryption and secure aggregation will appear; however, unless your interest is in cryptography, we can focus solely on the mechanics. This guarantees that raw data never leaves the device, offering unparalleled privacy safeguards.

TensorFlow Federated (TFF), on the other hand, uses a federated averaging approach.

What’s that? Good question — models are iteratively distributed to devices, then update local parameters based on their data. These updates are then aggregated(added up) and used to improve the global model at a central server. While this method doesn’t provide the same level of cryptographic guarantee as MPC, it can be significantly faster and more efficient, especially for large-scale deployments.

Performance Showdown: Speed & Accuracy in the Ring

TFF, with its centralized aggregation and model distribution, often outperforms PySyft in terms of training speed and convergence.

Its optimized communication protocols and scalability make it ideal for large-scale federated learning tasks with numerous devices. However, this efficiency comes at the cost of potentially compromising privacy when sensitive data is involved.

PySyft, with its focus on encryption and MPC, ensures robust data privacy but can have slower training times due to the computational overhead of cryptographic operations. Nevertheless, its commitment to privacy makes it the preferred choice for scenarios where data sensitivity is paramount, such as healthcare or financial applications.

Key Takeaway #1: Best not to use TFF when dealing with sensitive information like this creepy IoT Ferby. Use PySyft instead.

Resource Requirements: A Battle of Efficiency

TFF’s centralized architecture makes it less resource-intensive on individual devices.

Model updates and communication overhead are offloaded to the central server, minimizing the burden on edge devices. This is advantageous for devices with limited processing power or battery life.

PySyft, with its focus on on-device computation, demands more resources from individual devices. However, this also makes it resilient to server failures or disruptions, as each device holds a piece of the model and training can continue independently. This distributed nature offers improved fault tolerance and robustness in scenarios with unreliable connectivity.

Key Takeaway #2: Skynet probably used TFF because it’s more efficient to communicate with the remote terminators.

Customization and Ease of Use: Finding the Right Fit

PySyft, with its underlying PyTorch foundation, offers extensive flexibility and customization options. Users have fine-grained control over the training process, cryptographic protocols, and model architectures. This level of control allows researchers and developers to tailor the framework for specific needs and privacy requirements.

TFF, with its focus on ease of use and accessibility, provides a more streamlined experience. Pre-built components and well-documented APIs simplify the development process, making it readily accessible to users with less technical expertise. This user-friendliness makes it a good choice for prototyping and quick deployment of federated learning models.

Key Takeaway #3: Use PySyft for the complete build-out. For testing and PoC, TTF is the way to go.

The landscape of federated learning on the edge is rapidly evolving, and PySyft and TensorFlow Federated stand as frontrunners in this exciting domain.

Hopefully, understanding the strengths, weaknesses, and ideal use cases helps you to choose the right tool for your needs. Beyond individual frameworks, the future lies in collaborative approaches and hybrid architectures that harness the best of both worlds. By prioritizing privacy, efficiency, and responsible AI development, federated learning on the edge promises to unlock a future of secure and personalized AI experiences, reshaping the way we interact with data.

Sources:
1) https://developer.nvidia.com/blog/federated-learning-with-homomorphic-encryption/

2) https://www.tensorflow.org/federated

3) https://github.com/OpenMined/PySyft

4) https://www.theverge.com/circuitbreaker/2016/6/30/12067462/furby-connect-bluetooth-app-content-be-afraid

--

--

Ryan McCoy
Automated Inspections

Startup Founder. AI/ML Engineer. Sailing Enthusiast. Follow me as I write about Math, Machine Learning, and Quantum Computing