Distributed Learning for Democracy

Liberty in Danger

The inspiring book of Yuval Noah Harari, 21 Lessons for the 21st Century, discusses the fundamental challenges humanity faces in the era of bio and data technologies [1].

In Chapter 3 of the book, titled Liberty, the author warns that AI might be more potent in the hands of totalitarians. The following scenario clearly illustrates what Harari has in mind.

If an authoritarian government orders all its citizens to have their DNA scanned and to share all their medical data with some central authority, it would gain an immense advantage in genetics and medical research over societies in which medical data is strictly private.

But how is this possible, and how can the AI and data science ecosystem change the game in favor of democracy? These are the questions I aim to address in this article.

Smile! You're on CCTV all over the world! (source: Flickr)

The Rise of AI-powered Totalitarianism

What Harari envisions about the impact of data science on democracy can be broadly illustrated in the following image.

Image by author, inspired from [1]

Indeed, since the emergence of data-processing techniques in the late twentieth century, democracies have enjoyed a golden age:

Given twentieth-century technology, it was inefficient to concentrate too much information and power in one place. Nobody [including dictatorships] had the ability to process all the information fast enough and make the right decisions.

However, the story might not be happy-ending:

AI might make centralised systems far more efficient than diffused systems, because machine learning works better the more information it can analyse.

Practitioners of AI and data science are familiar with the above sentences. The more data collected, the better the output. Consequently, authoritarian governments and BigTech companies can gain a competitive edge thanks to well-structured bureaucracy and advanced surveillance systems.

Level Playing Field

Democratic governments have limiting regulations (e.g., the well-known GDPR in Europe) that prevent them from accessing massive amounts of citizens' data. Also, Small and Medium-sized Enterprises (SMEs) lack the massive datasets Big Tech companies possess. Who has access to -even an anonymized version of- Facebook's Graph dataset?

So what democratic governments and SMEs can do? One possible technical (and possibly not the most promising) solution has been spoiled in the title of this article: Distributed Learning.

I would like to share two personal opinions before continuing:

  • Distributed learning can be beneficial in undemocratic countries as well. In fact, there is no such a binary classification of countries as democratic and totalitarian countries; there is a spectrum of democracy that each country lies within;
  • Technological solutions are not a panacea for humanitarian issues. These challenges should be ultimately resolved with humanitarian solutions.

Learning, but Distributed!

Any Machine Learning (ML) model learns to produce better results by extracting patterns in the data.

Consider a CCTV camera connected to an ML model. The model tries to find anomalies in the scene (e.g., shoplifting in a store). The common mindset to develop such an ML model is as follows:

  • Gather and store a sufficient amount of data (e.g., normal and abnormal scenes in the above example);
  • Train a single model on a fraction of the available data;
  • Test the model on the rest of the data to see if its works well.

The distributed learning mindset enhances this monolithic system to a modular one, illustrated in the following image adopted from a survey on the same topic [2].

In the case of data parallelism, one could train separate models on data chunks from different sources (for example, CCTV cameras of different stores).

In the case of model parallelism, data itself doesn't change, but separate models (usually on different machines) collaboratively extract patterns in the data.

Both data or models can be parallelized [2]

Now, let's focus on data parallelism, which is more related to our topic. This approach helps democratic governments and SMEs leverage the knowledge of various data sources without breaking privacy regulations.

For example, different countries may agree on a standard format for financial transaction data (if it doesn't exist already). Then, each country can train its own model to detect fraudulent transactions in the data. Without the need to share sensitive financial data, they can share their models (i.e., the knowledge extracted from their data) in order to make a powerful global model.

The above scenario is a class of distributed learning called Federated Learning and can be applied in addition to finance in other domains, including medicine, cybersecurity, transportation, networking, etc.

The last word: As with the state of democracy in different countries, distributed learning schemes also lie on a spectrum of varying data concentration levels. The following image demonstrates different scenarios [2]. The most futuristic scenario is peer-to-peer distributed learning (item d in the image), which is similar to sharing wisdom in human societies.

Degrees of distribution vary in distributed learning schemes [2]

If you are interested in this article, you can follow me here or on Linkedin to see similar discussions.

References

[1] Harari, Yuval Noah. 21 Lessons for the 21st Century. Random House, 2018.

[2] Verbraeken, Joost, et al. "A survey on distributed machine learning." ACM Computing Surveys (CSUR) 53.2 (2020): 1–33. (link)

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store