Schrems II: How to Safeguard Your Cross-Border Data Analytics

DataFleets
Jul 21, 2020 · 5 min read
Image for post
Image for post
Europe map at night. Provided by NASA, visibleearth.nasa.gov/view.php?id=79765.

“A ruling by the EU’s top court invalidates the key mechanism for transferring personal data from the EU to the US and imposes additional conditions for use of the standard contractual clauses.”

Latham and Watkins, regarding Data Protection Commissioner v. Facebook Ireland Limited, Maximillian Schrems (Case C-311/1) (Schrems II)

On Thursday, July 16th, the European Court of Justice invalidated the EU-US Privacy Shield, one of the key mechanisms for lawfully transferring personal data between the two jurisdictions. Data controllers must now conduct detailed examinations of the circumstances of each transfer, the adequacy of protection at the recipient country, and the parties involved (1). More than 5,300 companies operated under the EU-US Privacy Shield, ⅔ of which were SMEs (2). Authorities like the Berlin data commissioner have called data localization the only credible solution (3).

The core issue for enterprise AI / ML initiatives is that data must be “pooled”. The inability to aggregate data from Europe may cause AI / ML models to degrade, including risk assessments for financial transaction monitoring / anti-money laundering (AML) and recommendation engines for what to watch, where to travel, and what to buy. All of this is compounded by two existing challenges: GDPR and COVID-19.

“Our data in Europe is essentially frozen in an iceberg by GDPR. No one in the U.S. can touch it for analytics, and our ML models are poor because of it.” — Market-leading technology and travel company

“Coronavirus broke our credit underwriting models. All the patterns changed.” — Market-leading financial services institution

We suggest that data teams use this opportunity to future proof their analytics against the changing regulatory landscape in three ways. Let’s take them in turn.

1. Data Sovereignty

Future proof assumption: data should remain resident where it was created when establishing data pipelines and architecture.

Schrems II is the latest confirmation that data sovereignty is here to stay. The data economy is getting chopped up into Westphalian bits, blocking aggregation for analytics.

Definitions:

  • Data residency means that data “resides” or is stored in a location for regulatory purposes, such as tax regimes.

Our CEO David Gilmore was asked by Bloomberg about similar trends affecting USA / California CCPA and China:

… laws that require data reaped inside the country to stay there, with China being perhaps the most stringent example…More than 100 countries have some sort of data sovereignty laws in place, according to David Gilmore, chief executive officer of DataFleets Ltd., an enterprise software firm. In the U.S., state policies, such as California’s new consumer privacy law, provide further restrictions on how cloud companies handle data.

According to Bart Willemsen, Vice President Analyst at Gartner:

… by 2023, 65% of the world’s population will have its personal information covered under modern privacy regulations, up from 10% today

… by 2023, more than 80% of companies worldwide will be facing at least one privacy-focused data protection regulation

2. Cloud Migration and Multi-Cloud

Future proof assumption: my cloud provider must have a local data center in my countries of operation, and a multi-cloud approach may be required.

Just like COVID-19 accelerated cloud computing, Schrems II may catalyze local data centers for cloud providers. We researched which cloud provider was best positioned to take advantage of this shift. Here’s how many jurisdictions can be currently served by each provider (as of July 2020):

While currently all three have data centers in a similar number of geographies, Azure’s experience with a more distributed footprint could help them capitalize on this trend.

Disclaimer: DataFleets is cloud-agnostic, and we currently use cloud services from all three of the above providers.

We also observe regional fragmentation leading to multi-cloud implementations. For example, a leading financial services institution working with DataFleets uses Alibaba Cloud to support Asia Pacific while using one of the three above providers in US and Europe. With cloud data becoming increasingly politicized, we expect this Balkanization to continue.

3. Privacy-by-design (PBD) and privacy-enhancing technologies (PETs)

Future proof assumption: data ops and analytics should include best practices to mathematically limit privacy risk.

With this rapid increase in privacy regulation, investing now in best practices such as data minimization, reducing data copies, and risk-based anonymization is not only ethical, it makes business sense to preserve operating continuity and gain a marketing edge as a privacy-first brand. An example is Microsoft’s decision to uphold CCPA standards across the entire U.S, not just in California.

Privacy-enchancing technologies are rapidly maturing and gaining admiration from regulators. The UK’s Information Commissioner’s Office listed Federated Learning as a tool that can meaningfully contribute to data minimization efforts. There are three best-of-breed open source projects we recommend evaluating:

Federated Learning is especially applicable to the EU-US divide because its core insight is shipping models to data rather than aggregating data centrally. It combines:

  1. Privacy removes the need for traditional privacy approaches like data masking and tokenization

In the future, we predict Federated Learning and differentially-private federated SQL will be the prevailing paradigm for unified multi-jurisdictional analytics. This form of “arm’s-length data science” comes with the benefits of potentially greater and faster data access, improved developer productivity, and best-in-class privacy and security.

DataFleets’ mission is to end the tradeoff between agile data access for AI and Machine Learning and improved data security and privacy. We built our secure enterprise analytics layer on 3 key principles:

  1. No data ever moves from its original and secure location

Conclusion

It’s worth remembering there are trillions of dollars of economic growth at stake. A study from James Manyika and the McKinsey Global Institute in 2016 showed that cross-border data flows significantly contribute to economic growth, with upwards of $2.8 trillion of net positive economic activity. A separate study by in 2018 found AI can contribute 40 percent of the overall $9.5 trillion to $15.4 trillion annual impact by analytics.

Continue the conversation on @DataFleets or reach out to Try a Demo.

DataFleets Blog

All things #DataFleets

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store