Introduction to Delta Sharing for Secure Data Sharing
Every enterprise organizations use data now a days to derive business insights to increase their business. Data is the new oil and many enterprise organizations is focusing more on collecting data from the different sources work on the data driven projects. Once the data is collected , it becomes important for organizations to define a governed and secure approach to share the data.
Data bricks Delta Sharing for Secure Data Sharing
Delta sharing is an open source standard for the secure data sharing. Delta sharing makes it simple for the data driven organizations to share the data easily and efficiently.
Features of Delta sharing is as follows :
· Live Data Sharing
Delta sharing makes it possible for data driven projects to easily share existing data as well as live data with delta lake without physically copying it to any other system.
· Support for multiple data consumers
Data consumers can leverage data directly using delta shares with pandas, apache spark and other systems without deploying the same to any other cloud platform or on-prem platforms. It provides end users flexibility to consume the data faster.
· Increased governance and security
With delta sharing, enterprise organizations can govern the data and keep the live tracking and auditing of the data shared between different teams.
With delta sharing, teams can shared big data efficiently using various cloud storage providers like Azure Data Lake Storage Gen2, AWS S3 and Google Cloud Storage.
Clients using Delta sharing for Delta Lake
Many enterprise organizations and various tools have already started using delta lake for data sharing. Below Tools and vendors highlighted in the image are already using delta sharing to share the data.
Working with Delta Sharing Delta Lake
Delta sharing with Delta Lake is a based on simple REST protocol to securely share and access the data from the cloud data sources.
Two main entities involved in delta sharing with delta lake is as follows :
1. Data Providers
Data Providers can share the existing table or partitioned table in the delta lake format. Delta Lake table is a collection of parquet files and it is easier to use existing parquet tables into delta lake.
Recipients can consume the data using open source connectors like pandas, spark, python etc.
Data sharing using the delta lake delta sharing is performed using the below protocol :
· Client authentication is performed using the bearer token and execute the query against the table.
· When the client request comes to the server, server verifies request and execute the data from cloud or on-prem storage.
· Server generates pre-signed URL which allow client to read parquet file from the cloud storage and transfer the data with bandwidth.
With delta sharing with delta lake, it supports multiple tools and tools available in the market to reduce the complexities of the overall architecture and eco system.