What on Earth are Data Clean Rooms and why should you open the entrance door

and bravely enter the world where you can finally share, exchange and monetize the data you already own, without worrying about protecting the safety and privacy of the sensitive information it inevitably contains.

Most companies and organizations have been generating huge amounts of data for years now, and truthfully utilizing less than 3% of this enormous goldmine of information. Combining and analyzing data from different sources has always been important, but cumbersome and challenging, due to the security issues. In the end, your date not only has a marketing value but it also presents a great opportunity for generating a new source of income.

One of the main concerns around data sharing is the protection of personal and confidential information. For example, in healthcare, although there are huge potential benefits for both providers and patients, there are equally huge and valid concerns about sharing personal health information of the patients. Therefore, privacy technology must enable data sharing without compromising data privacy and effectively address all confidentiality concerns.

Snowflake Data Clean Room is the only technology in the marketplace that offers a secure environment that enables two, three or more parties to collaborate and share sensitive and confidential data with iron-clad privacy controls in place.

Snowflake technology offers a way to share extremely valuable, sensitive data and at the same time ensure confidentiality by eliminating the risk of exposing walled garden data to any other sharers.

Data Clean Room

As you can see In the figure above, Party A data stays secret from Party B and vice versa. The environment is secured, so that both parties can ask and answer each other’s questions and receive data insights without moving or copying the data. All privacy parameters are agreed upon upfront, by all the parties involved, before entering the cleanroom. Each party can be a data provider as well as the data consumer.

Measurable Values of the Data Clean Room Environment

By providing data transparency, Data Clean Room allows companies to monetize their existing data while permitting bi-directional use which not only enables them to increase productivity, but allows them to unearth extremely valuable and useful insights. Safe and secure data sharing creates endless revenue possibilities, and this should be music to everyone’s ears.

Organizations can perform joint data analysis and machine learning (ML) with 100% guarantee that confidential information will stay protected from their sharing partners.

Whether you are a clinical researcher wanting to share your insights and results or a bank looking at risk management in different parts of the world, Data Clean Room will help you collaborate and securely share sensitive or regulated data across teams, organizations, and partners.

If AI is the engine of the 4th Industrial Revolution, data is its fuel.

Compelling Use Cases for Data Clean Rooms

Shared data will create new opportunities and plethora of new business models, as it advances in adaptation. It will make it easier for more companies and organizations to engage in data collaboration.

Furthermore, it will help organizations streamline data management processes and therefore significantly lower their data management costs.

Adopters of Data Clean Room

So, what type of companies and sectors should be the first to take advantage of Data Clean Room technology?

Every industry owns data, but the level of regulatory oversight and maturity differs from industry to industry. Privacy regulations, like GDPR and CCPA, apply to everyone, regardless of the sector. However, the sensitivity of data varies and some sectors have more sensitive data than others, and therefore are subject to additional regulations.

The early adopters of Data Clean Rooms will likely come from highly regulated sectors, like financial services and healthcare or highly guarded sectors like government and retail.

Fines, legal fees, and the loss of business are all potential consequences of failing to meet regulatory and privacy requirements. That’s why highly regulated sectors should use Data Clean Rooms without exception. This is the only way to enable collaboration and compliance and gain trust from collaborators and consumers alike.

Up until now, the early adopters of Data Clean Rooms were mostly parties involved in media and advertising but Snowflake platform certainly allows parties to conduct even the most sensitive data sharing projects in any industry including healthcare, financial services, retail, manufacturing, science or even government.

Let’s have a look at data sharing potential within all of these sectors:

Advertising, Media & Entertainment

  • With the third-party cookies soon going away due to introduction of the new regulations, advertising and media companies are scrambling to look for alternative ways to understand, segment, measure and activate their audiences. By performing data analysis and data exchange in Data Clean Rooms, they will be able to better help advertisers address and target audiences. It will allow them to better analyze consumer preferences, product consumption and overlap, in order to better measure campaign success and ROI on advertising spend. Clean rooms provide a persuasive back-end solution: Advertisers, platforms, and retail channels can better match consumers across their data sets, all the while protecting their privacy.

Retail and CPG

  • Automotive companies will be closing gaps in the customer journey by leveraging a Data Clean Room to safely tap into digital signals from classified ads, to reflect the most recent behaviors, intent signals and additional attributes that complete consumer profiles. By combining comprehensive consumer data with interests and behaviors while keeping identifiable information private, the Clean Room enables advertising partners to deliver better experiences by targeting consumer’s specific interests and preferences, making the campaign more targeted and more effective.
  • Global beauty and wellness brands will increase efficiency and effectiveness across their entire portfolio of brands by leveraging the user and impression-level campaign data that is uniquely available in another party’s clean room to expand the scope of their campaign insights and effectiveness.
  • CPG retailers who rely on marketing mix modeling to optimize their marketing and promotional activities as well as product decisions (building new product categories) and supply chain decisions (how much to distribute to which distribution centers) will uncover new ways to merge this information. These models require marketing data from the CPG companies and POS data from retailers, both of which can be combined securely in a Data Clean Room. Retailers & CPGs can analyze purchasing behavior of their customers from billions of credit card transactions for a better defined and more curated customer experience.

Supply Chains & Manufacturers

  • Visibility and integrity are core aspects of a strong supply chain management (See Supply Chain Management figure in Appendix). Data Clean Room allows for a full ledger of supply chain activities that provides compute and cost efficiency and removes bottlenecks to allow seamless sharing of information. It provides a big picture of the supply chain from end-to-end. Each participant of the clean room can simultaneously contribute their information to the supply ledger and verify the actions taken pre and post their involvement in the process. This ensures the integrity and functionality that offers a stronger platform, enhanced inventory oversight and quality / safety of transportation management. Controlled access to the data will be also available via blockchain — a shared ledger similar to the one that powers bitcoin and other crypto currencies.
Supply Chain Optimization Data Clean Room

The Next Wave of Innovation in Freight

  • Car manufacturers and auto dealerships will be able to perform an overlap analysis between the two sets of data, which they will then share to learn more about their ideal end consumers, all while complying with privacy regulations.
  • Food suppliers will be able to unlock the mystery of supply and demand, by unlocking anonymized sensitive sales and delivery data. Manufacturers and retailers will be able to purchase consumer data from third-party data brokers and combine the data within a value chain from suppliers to manufacturers to marketers to create a more refined picture of the end consumer profiles and product demand.

Financial Services

  • Data Clean Rooms will enable fraud detection modeling with a signal from across an ecosystem of financial institutions while maintaining consumer privacy, and will assist financial institutions to comply with Anti-Money Laundering (AML) laws.
Inter-Bank Fraud Detection Data Clean Room
  • Mergers & Acquisitions processes will be accelerated by quick customer and revenue overlap between parent companies and potential acquisition targets.
  • Investment managers and financial service companies will capture and analyze data from their back, middle, and front offices in real time. As a result, the time required to begin sharing investment data with clients will decrease from “months to minutes” or even seconds.
  • Banks in developing regions will pool anonymized credit data to build an inter-bank credit risk scoring system.
  • Financial institutions along with government agencies will participate and share information on entities or individuals under criminal investigation, as well as share intelligence about new financial crime typologies. This will help improve timely financial crime detection and enhance monitoring, identifying and freezing of illegal assets and bringing offenders to justice in a more timely manner.

Healthcare & Life Sciences

  • Clinical research is rapidly decentralizing and opening up opportunities to involve a much broader segment of the patient population in clinical trials, ultimately speeding up research and market placement of the products. With this change comes the need to collaborate and share data among Sponsors, CRO’s (Contract Research Organizations), Providers and Retail Clinics, Diagnostic Organizations and other entities involved in clinical trials, population selection, result validation, and drug discovery use cases across vast amounts of multiparty data.
  • Care models are rapidly evolving to more digital / remote settings and with that an opportunity presents itself, to reduce cost of care and at the same time provide a better, smoother patient experience. The introduction of Data Clean Rooms to health care providers will leverage remote patient monitoring and wearables data to provide a more personalized, interactive health experience for the members across health care settings while maintaining patient’s data privacy.
  • Hospitals and insurance companies will securely share patient / subscriber data to analyze therapies at a lower cost. Genomics data will be combined with EMR (Electronic Medical Records) data and payer/provider data to build ML models to drive better care at lower cost across the industry.
  • One of the biggest opportunities of all is with pharmaceutical researchers and doctors operating within a secured Data Clean Room ecosystem. It will allow them to pool data and better understand how to quickly bring life-saving innovations and treatments to the market.

Public & Government

  • Local or federal governments will be able to share and draw crucial data insights and integrate valuable information from social platforms or other unstructured data and perform analysis to improve communication channels, surveillance programs and enrich watch-lists to refine monitoring.
  • Public based traffic and navigation apps will share their data with cities around the world to provide information to improve traffic patterns, quality of roads and shorten commute times. It has the potential to make public transportation more seamless or to improve infrastructures and ultimately help build smarter cities.

Technology

  • One of the most prominent applications of Data Clean Rooms is cyber threat reduction For years, the U.S. government and standardization groups searched for ways to enable data sharing to enhance cyber threat mitigations.Sharing security data between corporations will help prevent future online attacks as cyber threat mitigation mechanisms work better with more comprehensive information on attackers. As an example, two companies can query each other for common sources of attacks. If companies have many attackers in common (i.e. high similarity), they can exchange that information and enrich their predictive models for mitigation of future attacks. Technology alone cannot solve all problems, but it can yield effective results by mitigating tradeoffs between transparency and security. Snowflake has demonstrated that it is possible to design Data Clean Rooms that address all security and privacy concerns.

Education & Science

  • Data Clean Rooms will enable a network of scientists to securely share research papers or results and collaborate between the educational facilities and staff members. Call it “science of the future”, as data sharing empowers the science of tomorrow. This technology will enable and encourage scientists to share and pool their research results with other field scientists without revealing sensitive information. This data sharing practice will soon become a norm and will open previously unimaginable potential for research. This is not true solely for large-scale data sharing initiatives, even relatively small datasets can contribute and fuel future scientific discoveries in unexpected and meaningful ways. Given that we cannot predict how valuable any small set of data can become, there is a strong argument that unshared data is an obstacle to the advancement of science in the future.

These are just a few use cases, but the need to share and combine sensitive data exists across all industries and the benefits are almost immeasurable.

How does Snowflake Enable Data Clean Rooms?

Snowflake Data Clean Room is a framework which enables organizations to share their data that is live and ready to query. No need to copy or move the data since it already exists on Snowflake platform. Clean Rooms leverage Row Access Policies to perform analysis on PII data without exposing any confidential information and details Streams and tasks monitor the requests and responses in real time, so that there is no need for any central location or middle layer to perform multi-party data sharing collaboration.

Unlike other companies, Snowflake provides the Data Clean Room where each party controls its own data, allowing governed, controlled analytics with other parties.

Snowflake’s iron clad security policy guarantees 100% confidentiality and integrity of data at rest, in use or in transit.

Here are just a few reasons to build Data Clean Room on Snowflake:

  • Clean Rooms on Snowflake enable rapid analysis of multi-party data that keeps your raw data hidden from any other parties, including Snowflake.
  • Data is never moved outside of the Snowflake environment.
  • Snowflake has the ability to run a virtually unlimited number of concurrent workloads with flexible computing power against the same, single copy of data.
  • Partnership with Snowflake can virtually eliminate the costs and risks associated with traditional ETL processes, movement of data and integration with direct access to ready-to-query data.
  • With Snowflake you can create differential privacy capabilities using Secure User Defined Functions, native access controls like Row Access Policies and Database Roles to control content of shared data, obfuscation with Defined Data Masking, pseudonymization with built in hashing functions, data anonymization or even integration with third party ID resolution vendors.
  • Snowflake provides cross cloud-cross/region data sharing and Data Clean Room capabilities across these major cloud vendors.
  • Snowflake additionally provides an Auto-Fulfillment feature where the platform automatically replicates your data to consumer regions as needed.
  • Snowflake natively supports SQL, Scala, Java, Javascript in GA and Python in Public Preview.

Future of Data Clean Rooms

Data continuously gains value when shared. Yet data privacy policies and competitive secrecy demands have historically placed limitations on its ability to realize its true value. Today, a new class of computational approaches collectively known as privacy-enhancing technologies is poised to deliver from privacy’s restrictions. Approaches such as secure multiparty computations, differential privacy, and functional encryption make it possible for organizations to receive the benefits of data-sharing without sacrificing privacy. Here are just a few of those privacy-enhancing technologies that we can possibly include in our Data Clean Rooms today:

  • Homomorphic Encryption — Encryption schemes that let parties perform computations on encrypted data without first decrypting it.
  • Differential Privacy — Systematically incorporates random noise yet still be able to compute patterns while hiding information about who is in the dataset. Therefore it is impossible to reverse engineer the original inputs.
  • Federated Learning — Parties sharing insights from their analysis without sharing the data itself.
  • Zero Knowledge Proof — Parties can prove their knowledge of a value without revealing the value itself.
  • Functional Encryption — Parties have a key that allows them to view some parts of the encrypted provider’s data.
  • Secure Multi-Party Computation — Data analysis is spread across multiple parties such that no single party can see the complete set of inputs.
Future of Data Clean Room

Conclusion

Data Clean Rooms are no longer just a thing of the future. They offer a real time solution to unlocking diverse collaboration opportunities, not imaginable in the past.

Remember, data should be used wisely. Data Clean Rooms should be THE PRODUCT that will launch new ways to collaborate in the world of PII data. They are the future of data collaboration, analysis and precise consumer targeting.

Appendix:

Supply Chain Management

--

--

Marcin Kulakowski
Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

Don't solve a problem, offer a better solution and show the art of the possible. Currently @ Snowflake.