What is Privacy? (Part II)
In our previous article we showed why a static database that provides exact answers to a query is unable to maintain privacy over the rows in the database. We now provide a formal definition of blatant non-private mechanisms and the minimum required perturbation required to maintain privacy. The model is from the paper “Revealing Information While Preserving Privacy” by Dinur and Nissim.
Database and Query Model
Suppose we have a database D where each row (d_1,…,d_n) takes the value of {0,1}^n. A query q over any subset of the database rows is answered by A(q). The result of the database query A(q) is allowed to be perturbed.
By perturbation each individual row might be perturbed (e.g., randomized response) or the aggregate might be perturbed (e.g., Laplace noise). Either perturbation mechanisms are captured in this model.
What will be shown is that a minimum required perturbation must be added in order to guarantee privacy, as will be explained below.
Statistical Database and Query
A statistical query is over a subset of the rows of the database whereby the result of the query is the number of rows that match the query. That is, a statistical query can be thought of as a counting query.
A statistical database provides access to the rows of its database via a query-response mechanism by statistical queries.
The exact answer to a query will be denoted by a_q. That is, the query has no perturbation.
The perturbed database query result A(q) is within epsilon perturbation if |a_q — A(q)| ≤ epsilon for all queries q of |n|.
Non-Privacy Definition
First some notation.
neg(n) — neglible that is a function which is asymptotically smaller than any inverse polynomial
dist(c,d)- the Hamming distance between to bitstrings c,d of {0,1}^n
A database D(d,A) (composed of the rows of the database and the perturbation algorithm) is non private if for every constant epsilon > 0 there exists a probabilistic Turing Machine M such that
Pr[M^A (1^n) outputs a candidate bitstring c such that the distance dist(c,d) < epsilon x n] ≥ 1 — neg(n)
Non-Privacy with Linear Noise and Exponential Adversary
An adversary which might brute force attack by issuing all possible queries is able to violate privacy by outputting the candidate bitstring. The adversary is successful even if nearly linear perturbation is created.
The attack works as follows.
- The adversary generates all possible queries over all possible subsets of database rows.
- The adversary generates all possible candidate bitstrings over {0,1}^n
- For each query the adversary submits each candidate bitstring. For any candidate bitstring which is greater than epsilon error distance then this candidate is discarded. At the end, there should be the candidate bitstring which is the original bitstring database because the original database is never weeded out.
This attack works because only nearly linear perturbation is added. According to the non-privacy definition the adversary should output a candidate bitstring which dist(c,d) < epsilon x n. Thus, the candidate database will survive pruning for an exponential adversary that can brute force attack all combinations.
In our next article we will examine a polynomially bounded adversary.