Exploring PETs: What Does the Price of Sugar Beets Have to Do With Preventing Fraud?

The Center for Effective Global Action
CEGA
Published in
5 min readApr 19, 2023

Privacy Enhancing Technologies (PETs) are at the forefront of efforts to protect privacy and enable responsible data use. But what are they and how are they used? Dan Cassara, Project Manager for the Digital Credit Observatory (DCO), provides a brief overview of common privacy preserving approaches and the first case study in our new series on ‘How PETs Make Data Work for All.’ Read part two.

Photo by regularguy.eth on Unsplash

Introduction to PETs

Data dominates today’s world. Information about our location, internet searches, purchasing patterns, and social networks is collected and harvested for insights by large technology companies and data brokers, reshaping society. Conversely, open data sets possess enormous opportunities to catalyze positive change. Because data is a unique asset that can be reused, the more open data is, the more valuable it can become.

But risks exist. Data collection can be exploitative, and sensitive data can be stolen, shared dangerously, or misused. No single tool protects against all of these risks but, collectively, a suite of tools called Privacy Enhancing Technologies (PETs) may be able to help.

A recent Future of Financial Intelligence Sharing (FFIS) report provides a helpful analogy to understand PETs: imagine data as a physical asset stored in a safe. It’s valuable when shared or analyzed, but removing the data from the safe exposes it, compromising its safety. PETs enable researchers to analyze sensitive data without accessing the data itself; namely, while the data remains “in the safe.” PETs can even enable collective analysis on information from multiple “safes” without sharing the underlying data.

To understand how PETs can enable cooperation and information sharing while preserving data privacy, we’ve developed a new series, How PETs Make Data Work for All, that will cover how PETs work in practice, their respective strengths, and how they can be combined given different objectives and contexts. To start, we’ll define a few prominent PETs. We also share our first case study in the series, highlighting the potential for PETs to contribute to global development and digital financial services.

Defining Common Privacy Enhancing Technologies

  • Differential privacy: Strategically adds random noise to statistics so underlying data can’t be reproduced. For a more detailed explanation, check out this blog or video.
  • Federated analysis: Runs machine learning algorithms on many local datasets, rather than a centralized dataset of many users’ data, so data never has to leave a device. For a more detailed explanation, check out this blog or video.
  • Homomorphic encryption: Methods that enable data to be analyzed while it is still encrypted (e.g. “in the safe”). For a more detailed explanation, check out this blog or video.
  • Secure Multiparty Computation: Allows multiple parties to jointly analyze data without any of the underlying data being revealed. For a more detailed explanation, check out this blog or video.
  • Zero-knowledge Proofs: Verifies a user knows a piece of information (e.g. a password) without revealing the information itself. For a more detailed explanation, check out this blog or video.

A Case Study in Private Set Interaction for Multi-Party Reporting

When you’d like to share the results of an analysis or data query, but not the data itself, Secure Multiparty Computation (MPC) can be a useful tool. MPC allows multiple parties to jointly analyze data without any of the underlying data being revealed. Its first large-scale application came in 2008, when it was used to determine the price for sugar beets in Denmark. Since then, use cases have multiplied.

Instead of sugar beet prices, suppose a central bank is interested in estimating the number of mobile money accounts and the volume of mobile money transactions in its country. Most governments receive estimates from financial service providers as a part of standard compliance reporting, but simply aggregating these supply side estimates is typically inaccurate because of the many open but inactive accounts, as well as the common practice of maintaining accounts across multiple providers. Better estimates of the number of accounts can paint a detailed picture to help regulators both understand consumer behavior such as repayment trends or overall debt levels, and combat illicit activities like fraud or money laundering. This data is the bedrock of many good regulatory policies that enable a cheaper, safer financial system for consumers.

Know Your Customer (KYC) regulations typically require banks to collect an individual’s national ID during the account opening process. This ID is linked to the account, but each bank maintains a separate database of accounts and account holders. Governments could better estimate national account openings, usage, and volumes with access to each bank’s data, but this is currently infeasible because it would create privacy and competitive risks for the banks.

MPC could mediate this challenge by enabling banks to share relevant information with the government while safeguarding their list of customers and preserving customer privacy. In particular, a type of Multiparty Computation called Private Set Interaction (PSI) can facilitate this improved regulatory infrastructure. PSI enables users to identify overlapping information from multiple datasets without revealing any other data. For example, governments could learn how many individuals hold accounts with multiple financial institutions without having access to any other information.

How it Works

Let’s assume the only two financial providers are National Bank and State Bank, with N and S customers, respectively. Each customer’s unique ID is a string of numbers, like a passport number. To securely identify the overlapping elements of N and S, PSI facilitates the following:

  1. National Bank and State Bank randomly select secret numbers, i and j, respectively.
  2. National Bank takes their list of N unique IDs and raises each to the i power, while State Bank raises each of their S unique IDs to the j power.
  3. For example, unique ID 10321 raised to the secret number 3 would produce the new number 103213 =1099424306161. It’s a big number, but thankfully, computers do all the work.
  4. The banks swap lists and repeat this exercise. For example, National Bank now has a list of S unique IDs which have each been raised to the j power; it raises each element in this list to the i power. State Bank follows the analogous steps.
  5. National Bank, State Bank, and the regulator can now compare the two lists for any common elements. If the parties find C common elements, then (N + S — C) is the number of unique account holders between the two banks.

Using MPC, we’ve now accounted for individuals with multiple accounts without having to share any sensitive data. This highlights the power of PETs to create public goods: with this data, regulators can provide better oversight and more effectively monitor for fraud or money laundering, without compromising security or privacy.

There are many similar examples of how PETs can unlock data sharing to reduce costs or enable innovation. Our next case study in the How PETs Make Data Work for All series will cover how differential privacy can enhance public health monitoring.

--

--

The Center for Effective Global Action
CEGA
Editor for

CEGA is a hub for research on global development, innovating for positive social change.