Ocean Protocol — democratizing data markets

Blockchain vNext Series (Part 2)

Stefan Grasmann
The Startup
9 min readDec 21, 2017

--

In the first part of this series about next generation Blockchain technologies we had a look into decentralized data storage with FileCoin and IPFS. But storing your own data in a safe place and limiting access to yourself and your friends or family (let’s say “trusted, well-known entities”) is just one small part of the overall story.

Picture by Joseph82, found on Pixabay

Trent McConaghy from Ocean Protocol likes to put it this way:

“Data wants to be free” and “Data wants to be expensive”.

What does he mean by that? Well — most of us are giving away data every day. We might do that in order to gain likes on Facebook or Youtube — or claps on Medium. Or we do that to gain insight from our data when we use wearables to track our sport activities.

But many people and companies might be interested in our data. They would pay money — perhaps small money — for access to that data. Up to now there isn’t a trusted market place where these micro transactions can take place. Well — to be very clear: there are data markets in place. We just don’t notice them, because we take no active part in these markets.

Our data is either sold to show us ads that we usually don’t want to see. Or our data is handled in the dark — without our knowledge — maybe even for matters we are not ok with. So currently our data is de facto “free” and it is “expensive” — without the ultimate data provider (us) getting returns for that data.

So some smart people asked themselves:

  • Could we use Blockchain building blocks and tokens to create a democratic, diverse data market?
  • How could we balance this data market between people that make data available, those that rate the value of data sets, those that clean and curate the data with those who want to use that data?
  • How could we create a system that is attractive for free data providers (data commons), small data providers like NGOs, but also big players to later join in?
  • How could we use the functioning aspects of the Blockchain ecosystem to address these issues?
  • And how could we leverage the good parts of the Bitcoin ecosystem design — the incentive system for different players — to boost a thriving data ecosystem?

The Ocean Protocol (Ocean) wants to be the answer to these questions. Ocean wants to be the substrate of an ecosystem that is intended to work as follows. I’ll try to give you my high-level understanding of the market design goals from the perspective of a data provider:

  1. Let’s imagine you want to provide a data set to the ecosystem. You can do that but you have to put a certain initial “bet” on your data. You use Ocean tokens for that bet. So you pay kind of a base fee to be part of this market. This is important: the data originator has some stake in the game. This prevents people from flooding the system with rubbish data.
  2. If someone else now wants to use your data, they also use Ocean tokens to pay for the usage.
  3. But you as the original data creator don’t get all of these tokens. There are other players in that market: Those who run nodes to find your data and make your data accessible. These guys also get some percentage of the token bill. You might play this role by yourself or you might want to delegate this task to a third party. The system is set up to incentivize these third parties to arise. We might call these third parties “miners”.
  4. But there is another problem in our ecosystem: On the one hand side we want our market to be big, to reach many possible consumers of our data. This has the problem that this market might get quite hard to understand as it grows. Regardless of the size of that market: If we provide great data we want to be rewarded for it. We want our data to shine — and we want it to be easily accessible. We expect our ecosystem incentivizing good data. We expect our ecosystem to be self-healing. We want it to be somehow curated. But not curated by a single player — rather curated by the masses — or a fourth player in the market: Specialized curators who get incentivized to judge the value of data.
    Yet: This aspect is not as simple as it sounds. An ecosystem lives from its balance: It will only be relevant if the market is not over-emphasizing three or four important data sets. Then Google or Facebook would enter that market and dominate it. We would lose balance. The ecosystem needs diversity. It needs to be attractive for new data sets enriching the value of the ecosystem. It needs to make also niche data providers and newcomers shine and get their part of the market share. Ocean Protocol tries to solve these questions with math: They don’t use linear but logarithmic functions when paying tokens to data originators or curators. It respects the fact that it is more work for all participants to find the first ten “fans” of a certain data set then the next ten. This sounds trivial but plays a crucial role to achieve a balanced market.

To me this sounds like a decent plan.

I hope I got your full attention by now…

Ocean goes one step further. McConaghy is quite deep into Artificial Intelligence (AI) and machine learning — as you can see in his impressive bio. He wants the Ocean ecosystem to solve an important problem in the AI space: You usually need a lot of data to train your algorithms — to make them better. There are experts in the field that say that you get better results if you feed an algorithm from twenty years ago with loads of data than if you train a modern, better algorithm with fewer data. Nowadays only very few players have access to these loads of data. Exactly these players invest heavily into AI.

Small AI startups simply have no chance to compete because of lack of data. Ocean wants to use its protocol and ecosystem described above to open up that market, democratize AI and boost AI research in smaller companies. They also want to pay e.g. NGOs to deliver their data and get paid for it via the Ocean ecosystem.

I’m impressed how far these thoughts go.

Technology

I scanned the available Ocean Protocol whitepapers about the marketplace and the technical primer in order to understand how the Ocean Protocols wants to fulfil the ideas explained above.

One of the key ideas is to generally avoid the collection of data into a centralized or decentralized cloud, but rather keep the data decentralized on premise behind firewalls and bring algorithms to the data — not the other way around. Data is “sticky”. It is usually quite expensive to move lots of data around. And it is also hard to secure the migration of bigger data sets and access to it. So it is generally a good idea to keep data near the location of its origin.

Let’s start with the technical market design which is strongly aligned with Ocean’s governance goals:

The market is designed to let quality data shine. So Ocean uses Proof of Stake (PoS) mechanisms where stake in this case means the “measure of the belief of the future popularity of the data”.

As you can see in the following picture, Ocean divides its ecosystem architecture into two main layers:

  1. Different Data Marketplaces
  2. Keeper nodes
Source: Ocean Tech Primer, page 10

Let’s dive into these Keeper nodes first: They have four different main tasks:

  • User Registry manages the stakes of its users including white listing
  • Data Curation lists a curated set of available data sets
  • Data Pricing defines pricing for data acces (including free data)
  • Verifying makes sure that a node actually gives access to data as promised

These nodes may run on-premise or in the cloud. They guarantee that algorithms are brought to the correct files — making sure they work on the right data set. Nodes also guarantee the immutability of that data and that algorithms play according to the rules — e.g. prevent that data gets stolen.

An interesting point is that Ocean doesn’t store data in those keeper nodes themselves — these nodes rather control and manage the access to your (encrypted) data underneath. IPDB, BigchainDB and CoalaIP seem to play a crucial role in implementation of this data access layer in between.

It’s not finally clear to me how this all works out in detail at runtime. Some kind of runtime picture would really help to see which codes runs where and when to make the big picture work.

Seems like Ocean wants to verify algorithms that run on your data in order to prove that they do exactly what they are intended to do. The according technical whitepaper to explain details how this is achieved is still in the making.

Ocean claims that the high-level marketplace is designed around existing and upcoming data protection laws like GDPR. The whole market and all its players are incentivized and control themselves via utility tokens. These tokens get easily lost if a player doesn’t act according to the rules built into the protocol. Ocean uses Token-curated registries to manage the balance between all stakeholders and establish trust.

Ocean currently identifies seven stakeholders:

  • Data Providers
  • Data Consumers
  • Data Marketplaces
  • Data Mashers
  • Data Referrers
  • Network Keepers
  • Regulators.

Data Mashers are an interesting species: they fulfil the role to bring value-added services into that market, e.g. by cleansing, transforming or augmenting different sets of data and create new data sets — without losing the IP tracking of the original data provider.

The marketplace paper goes into some depth regarding market design. These topics seem to be really thought through.

Funding

Ocean Protocol is based in Berlin and still in early phase. I didn’t find too much information about an ICO or similar aspects. But there are early investors in their technology like Outlier Ventures and partners like the data marketplace DEX from Singapore.

CTO and co-founder Trent McConaghy founded several other tech startups like Ascribe, BigchainDB and IPDB before. Most of these also build the foundation for the activities for the Ocean Protocol. He found a number of well-established partners for these companies as you can see here.

Impact on society

Ocean surely plans to impact society and even humanity. These aspects are a core focus of their activities — as you can read in many of Trent’s publications on Medium or Slideshare.

Winning back the ownership for our very own data sounds like a decent plan.

But the possibilities go further: If you think of areas like digital health or IoT, Ocean’s basic design sounds also very promising to be used as a foundation for smart solutions.

Think of storing medical data of a patient. If you design these systems you really want to make sure that this data won’t get lost or accessed without the necessary control. At the same time it makes sense to work with this data in an anonymized fashion, e.g. for research. These problems are really hard to solve in a common centralized cloud scenario. The more data you locate in one central data lake the bigger are the incentives for attackers to get access to it.

I also saw similar problems in IoT scenarios. In many cases you have more than one player involved. Let’s take a sensor that is measuring the pressure of a tire of your car. Who is allowed to use the data this sensor is collecting? The manufacturer of the tire? The manufacturer of the car? The driver of the car? Or the owner of the car? Or maybe the insurance company? The police after a crash? All these parties for sure have valid reasons to have access to certain parts of that data in one way or the other.

Wouldn’t it be helpful if we had standard mechanisms in place that were capable of solving these multi-stakeholder data scenarios? I think: Yes!

Conclusion

Ocean is a hot candidate to keep an eye on. It is still very early. But the potential is there… I hope I could give you some idea where this cool tech is heading.

If you want to dive deeper into Ocean I can recommend this interview between Rhys Lindmark and Trent McConaghy.

Disclaimer: This article is not intended to be an investment advice of any sort. Do your own research and search for professional support if you intend to invest in one of the projects mentioned in this article.

This story is published in The Startup, Medium’s largest entrepreneurship publication followed by 277,994+ people.

Subscribe to receive our top stories here.

--

--

Stefan Grasmann
The Startup

Blockchain enthusiast. Driving Thought Leadership @zuehlke_group to the next level. Innovator | Strategic Advisor | Networker | Speaker.