Updates to Uber’s Open Source Project for Differential Privacy

Menotti Minutillo
Jan 16, 2018 · 3 min read

In July 2017, we gave a first look at our work on differential privacy in collaboration with security researchers at the University of California — Berkeley. Towards Practical Differential Privacy for SQL Queries by Noah Johnson, Joseph P. Near, and Dawn Song, introduced Elastic Sensitivity for efficiently calculating query sensitivity without requiring changes to the database management system (DBMS). The technique, as part of an end-to-end system like the one described in the paper, enforces differential privacy for real-world SQL queries.

We also shared and open sourced a tool to calculate Elastic Sensitivity for SQL queries, designed to integrate easily with existing DBMSs. Our hope is that the tool will help other members of the privacy community get started with differential privacy without needing to invest in systemic changes.

Today, we’re excited to share an another update to our open source differential privacy project.

Current Challenges

Deeply integrated systems support many differential privacy mechanisms, but implementation either necessitates complex changes to the underlying DBMS or a purpose-built DBMS for each mechanism.

Post-processing, including Elastic Sensitivity, is DBMS-agnostic, but only supports a limited number of post-processing mechanisms.

Image for post
Image for post
A post-processing architecture.

Compared to modern and optimized database engines, both approaches are also inflexible, costly, and difficult to scale in real-world data environments.

New and Improved

The key advancement in this release is to embed the differential privacy mechanism into the SQL query itself, before execution, so the query enforces differential privacy on its own output.

This means queries are rewritten before they are submitted to an unmodified DBMS, providing a host of meaningful benefits.

Image for post
Image for post
The new approach rewrites queries before they’re submitted to the database.

First, we can leverage all of the properties of modern database engines where we submit these rewritten queries. That is, we do not lose the benefits of a given database or have to invest in custom database runtimes to enable differential privacy.

Second, we can easily adopt any number of different privacy-preserving mechanisms simultaneously. This allows not only agility and adaptability to new techniques, but supports a higher percentage of queries than any single mechanism.

Finally, this eliminates the need for post-processing, allowing easier integration in existing data pipelines and reducing the overhead of operating the expanded pipeline.

What’s Next

In our continued commitment to protect user privacy throughout our business, including our internal data analytics, we plan to detail how we’re using this new system in the coming months.

We hope you find today’s release both an exciting and useful piece of technology for your privacy protection needs!

Uber Security + Privacy

Insights and updates from Uber’s security and privacy teams

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store