Unlocking Valuable Data with Constrained SQL

How a platform for managing data access and constraining SQL queries across your organization can unlock data while reducing risk.

The Oasis Labs Team
Oasis Labs
Published in
4 min readApr 30, 2020

--

Imagine you work in security and compliance at a well known luxury automaker. You’re responsible for safeguarding a sophisticated data pipeline that streams data from automobiles and stores them in a data lake. The data includes personally identifiable information of each driver, in addition to location, feature usage patterns, mileage, battery state-of-charge, etc. The product and engineering teams want to use this data to improve various components that go into the automobile. You realize that the data is invaluable for these purposes and perhaps even for select external sharing, but are cognizant of the custodial risk of sharing and the regulatory risk of unfettered access. What would you do?

The Friction Between Privacy and Innovation

The above outlines one of the many instances in which engineering, compliance, and innovation can find themselves at odds. Taking it further, an HBR article by Ed Wilder James explains that in order to unlock data, one must first solve any structural issues. System designs rarely include internal data sharing as a requirement. Data can be poorly defined, stale, and sometimes just inaccurate. Using different systems to capture and analyze data across an organization can exacerbate these issues further. To further complicate the picture, regulations such as GDPR and CCPA now come into play. CCPA alone is expected to cost companies tens-of-billions of dollars. With increasing scrutiny from policy makers on how companies use individual information, companies need to constantly trade off the cost of compliance against the benefit of unlocking data.

Current Solutions Gave Gaps

To date, there exist many solutions for structural data issues. Data warehouses can help companies consolidate and standardize data across their entire company. They can help define a single origin of truth for information, and can ensure data freshness. What existing solutions lack is an ability to solve the trust and regulatory hurdles to unlocking data. Some might ask “Why not just anonymize my data so that I can now provide access without the consequences of leaking personal information”. Besides being hard to achieve, anonymization does not always protect privacy. A case in point is the Netflix dataset that was made available to support the Netflix Prize. The sequel to the competition was cancelled after a group of researchers at the University of Texas at Austin broke anonymity by correlating the anonymized Netflix dataset with the open IMDB dataset. In the world we live in, our data resides in so many places and in so many forms, that such correlations can be done with impunity leading to a break in privacy

Managing Data Access with Oasis Constrained SQL

Thus unlocking data requires first addressing any structural issues by ensuring that data is well defined and stored in warehouses and databases that support SQL. This way there is a common language to understand what is stored and how to access it. If you now have a mechanism to control how data is consumed, with enough granularity to comply with regulation, mitigate the risk of leaks and breaches, while giving constrained access to the data via SQL, you have a solution that unlocks your data.

Oasis offers a simple data management solution for controlling access to data. You can connect to any existing database or warehouse that supports SQL through our API and create views that specify how the data can be queried.

Here are just a few examples of the types of constraints that can be added to what we call a view of the data:

  • Aggregates: Pre-define aggregates to remove PII
  • SQL restrictions: Restrict the types of SQL commands that can be executed
  • Column exclusion: Block access to specific columns
  • Custom functions: Create custom restrictions to do anything from de-identifying data to controlling sample size
  • Differential privacy: Protect query outputs from indirectly exposing PII by introducing noise in query results with bounds on accuracy and how many queries can be run

Individuals or teams can then be granted access to specific views. They are free to explore, query, and analyze the data subject to the view’s policies. For example, a view could block queries for specific vehicle information or driver behavior, but allow statistical queries that return results in the aggregate. Now, your leasing department can estimate the value of a given fleet of cars by extracting average mileage, average state-of-charge, average time for charge depletion, etc. Query results can be viewed in dashboards and reports for quick communication of key metrics. You can also provide APIs for programmatic access to the data for your engineering teams. Every query that’s run is tracked in a tamper-proof record that is stored on a blockchain. This allows for easy, transparent communication of data use for audits and reporting.

Oasis’ detailed controls allow for easy management of data access across an entire organization — helping companies remain compliant with regulation and reduce their risk of consuming data. Giving more teams access to controlled data views allows companies to quickly unlock data previously too sensitive to use — driving product innovation and business growth.

Looking Forward

At Oasis, we’re building privacy technology that allows businesses to get the most out their data while reducing custody risk and complying with regulation. With proper guardrails in place, we hope to see a rapid increase in data use and collaboration. Not only can teams share insights internally, but companies can open their data to a broader community of partners, researchers, and scientists — driving new discoveries and business opportunities.

If you’d like to learn more about Oasis products and how they can help your business, email us at bd@oasislabs.com.

--

--