Democratizing Big Data. An Introduction to Ocean Protocol and Decentralized Data Sharing

Aidan Pak
5 min readJul 18, 2022


Artificial Intelligence is a transformative technology set to disrupt the majority of industries. AI is a field of Computer Science centered on the design of machines capable of solving problems that typically require human intelligence. AI-driven technologies automate laborious tasks, enhance human decision making, minimize errors, and identify key trends in complex data. In 2021, AI augmentation created over $2.7 trillion of business value and it is expected to replace 85 million jobs by 2025.

However, the future of AI is constrained by a misalignment between innovators and those with the necessary resources to engage in big data initiatives. The Artificial Intelligence industry is severely privatized and it is dominated by cloud service providers and social media giants. The necessity for Big Data and scalable compute infrastructure create large barriers to entry that prevent the majority of innovators from engaging in bootstrapped development. While cloud services have supplied the masses with scalable compute resources, AI innovation is hampered by a lack of access to large, relevant, potentially sensitive datasets. What is currently missing is an economic framework in which data could become openly accessible to innovators and exchanged in a secure, ethical, and liquid environment.

According to a Digital Universe Study, less than 1% of the world’s data is ever analyzed. Legal barriers, moral predicaments, and improper monetization channels discourage data giants from sharing/selling their data. Thus, today’s AI market leaders are AWS, IBM, and Google, or those with the necessary data to develop groundbreaking, deep learning models. While compute and algorithmic innovation have diversified Artificial Intelligence applications, the AI industry is underperforming its potential given Big Data constraints.

Ocean Protocol

Enter Ocean Protocol, a token-based decentralized network that aims to spread the benefits of Artificial Intelligence by democratizing Big Data. The goal of Ocean is to stimulate data sharing by offering proper incentive, control, and security structures on data services. At its core, Ocean Protocol is a blockchain-based data economy that connects providers (those with data) with consumers (AI innovators). Through tokenizing data services, the network opens monetization opportunities for Data Providers while preserving full control and privacy mechanisms. As a result, Ocean’s data economy offers AI innovators access to distributed, potentially sensitive datasets that would otherwise never be made available. Ocean leverages crypto tokens, decentralized service agreements, and access control infrastructure to assetize data services and curate a functional data economy.

Ocean Protocol’s core innovation is its Compute-to-Data feature, which allows AI innovators to run compute jobs without ever needing access to or visibility of raw data. More specifically, Compute-to-Data, a complementary technology to federated learning, allows innovators to train their models from private data without the raw information ever needing to leave the on-prem facilities of the data owner. The idea is to enable edge devices, such as smartphones, IOT devices, servers, etc. to work collaboratively on a shared machine learning model. Each device downloads the current state of the model, learns from it, and updates it without the raw data ever needing to leave the owner’s premises. This form of decentralized model training revolutionizes AI by bypassing the need to congregate sensitive data into one centralized source. Instead of navigating legal and privacy headaches, Compute-to-Data offers the necessary ownership and privacy structures to stimulate mass data sharing. This unprecedented access to raw data opens the possibility for AI Innovators to ethically construct revolutionary models from highly sensitive information such as genetics data, personal financial information, medical reports, and much more.

Ocean leverages the programmability and tokenization features of the Ethereum Blockchain to construct the backend of the data economy. Ownership of datasets is represented by non-fungible ‘Data NFTs’ and represent immutable copyright of a data asset. Providers then mint ‘Datatokens’ which represent the “right to access” a data service. Datatokens are fungible and are programmed to grant consumers perpetual, one-time, or Compute-to-Data access on a dataset. Since Ocean’s Datatokens are constructed with Ethereum’s fungible token standard, ERC-20, innovators can purchase Datatokens with cryptocurrencies and store them in traditional Web-3 Wallets. The minting, discovery, and sale of Datatokens is facilitated by a decentralized application (dApp) called Ocean Market which allows anyone with a Web-3 wallet to engage in the data economy.

The Power of Compute-to-Data — Genetics Data Model Training

As an example of how Ocean Protocol revolutionizes AI, imagine a medical research team has just developed a Deep Learning model specifically designed for genetics data. The traditional approach of congregating raw data to a single, centralized source would be impossible. Legal barriers regarding data privacy and transfer as well as the obvious economic barriers of purchasing highly sensitive data would prevent the research team from accessing the desired information. With Ocean Protocol, this form of data sharing is made possible through proper economic incentives and security features.

The process would start with genetics labs or other Data Providers minting their datasets on-chain as Data NFTs. These NFTs do not include the dataset itself but rather store metadata and are accompanied by a DID (a unique decentralized identifier). The Data Providers would then mint Datatokens (right to access tokens) and the research team would navigate the Ocean Market to purchase these Datatokens using a Web-3 wallet.

Once purchased, the research team would send the Datatoken to the Provider’s wallet and also send a request to engage in the given compute service. The Data Provider then performs a series of authentication steps on the research team’s Web-3 Wallet and Datatoken. These steps could include confirmation of payment to a smart contract address, signing of a service agreement, and verification of other credentials such as oracalized identity information. Once approved, the research team would then publish their algorithm as a DID and the consume process would begin.

The Data Provider would run the compute job completely on-prem utilizing the research team’s algorithm. The Provider would then upload the model output and execution logs onto an AWS S3 bucket. The URL to the S3 Bucket would then be shared with the research team and the smart contract would release payment to the Data Provider. The research team could then repeat this process with other Data Providers to fully train their model with the necessary sensitive genetics data.

At its core, Ocean’s decentralized token model is built to grow AI by incentivizing data sharing. Bruce Pon, founder of Ocean Protocol writes “data has immense value, but no one shares it because everyone’s scared to lose control. By creating a decentralized network where anyone can share safely, while keeping control and privacy, a new Data Economy can emerge.”

For the past decade, Artificial Intelligence has been changing the landscape of computing, both from a capability perspective and also in terms of infrastructure. These machines have become increasingly powerful and the demand for highly-scalable compute resources, Big Data, and powerful algorithms has never been greater. With effective data sharing technologies, the future of AI is limitless. With Ocean Protocol, data could become a liquid asset that allows both providers and consumers to benefit. For businesses, data centers can transform from a major expense item to a revenue stream while innovators can purchase data as a service. In the years to come, decentralized data sharing can contribute to new AI technologies that reduce energy emissions, identify genetic predispositions to cancer, and produce effective autonomous vehicles, amongst many more exciting innovations.

