Updates to Uber’s Open Source Project for Differential Privacy
In July 2017, we gave a first look at our work on differential privacy in collaboration with security researchers at the University of California — Berkeley. Towards Practical Differential Privacy for SQL Queries by Noah Johnson, Joseph P. Near, and Dawn Song, introduced Elastic Sensitivity for efficiently calculating query sensitivity without requiring changes to the database management system (DBMS). The technique, as part of an end-to-end system like the one described in the paper, enforces differential privacy for real-world SQL queries.
We also shared and open sourced a tool to calculate Elastic Sensitivity for SQL queries, designed to integrate easily with existing DBMSs. Our hope is that the tool will help other members of the privacy community get started with differential privacy without needing to invest in systemic changes.
Today, we’re excited to share an another update to our open source differential privacy project.
Existing differential privacy approaches typically adopt one of two architectures: deep integration or post-processing.
Deeply integrated systems support many differential privacy mechanisms, but implementation either necessitates complex changes to the underlying DBMS or a purpose-built DBMS for each mechanism.
Post-processing, including Elastic Sensitivity, is DBMS-agnostic, but only supports a limited number of post-processing mechanisms.
Compared to modern and optimized database engines, both approaches are also inflexible, costly, and difficult to scale in real-world data environments.
New and Improved
The extensions to our open source differential privacy project were driven by our long-held desire to deploy practical differential privacy at scale. We wanted DMBS-independence, extensibility, and low overhead. The updates we’re releasing today have brought us closer to that goal.
The key advancement in this release is to embed the differential privacy mechanism into the SQL query itself, before execution, so the query enforces differential privacy on its own output.
This means queries are rewritten before they are submitted to an unmodified DBMS, providing a host of meaningful benefits.
First, we can leverage all of the properties of modern database engines where we submit these rewritten queries. That is, we do not lose the benefits of a given database or have to invest in custom database runtimes to enable differential privacy.
Second, we can easily adopt any number of different privacy-preserving mechanisms simultaneously. This allows not only agility and adaptability to new techniques, but supports a higher percentage of queries than any single mechanism.
Finally, this eliminates the need for post-processing, allowing easier integration in existing data pipelines and reducing the overhead of operating the expanded pipeline.
Joseph will be presenting Differential Privacy at Scale at the 2018 ENIGMA conference, where he will debut these enhancements. There is also a paper in late-stage development from Noah Johnson, Joseph P. Near, Dawn Song, and Joe Hellerstein at the University of California — Berkeley, that describes in depth the project’s background, new system and novel architecture, performance and experiments, and key takeaways. We will share that paper here upon publication.
In our continued commitment to protect user privacy throughout our business, including our internal data analytics, we plan to detail how we’re using this new system in the coming months.
We hope you find today’s release both an exciting and useful piece of technology for your privacy protection needs!