Ocean Protocol: How Blockchains can contribute to AI by enabling Decentralized Data Exchanges

Raphael Hannaert
Bitcoin Center Korea
14 min readFeb 7, 2018
Credits: Kelly Belter

Modern digital society has led to an increasing awareness of the importance of data, sometimes arguably labelled as the “new oil” or “the next natural resource” (Virginia Rometty, IBM CEO). Many firms consider data-driven decision making as a competitive advantage and therefore allocate significant resources to collect and proceed data, which is illustrated by the opening of new Business Intelligence departments and an explosion in demand for data scientists. Some of the most commonly used digital platforms such as Facebook and Google are free to use but in exchange collect data about users (who agree upon this data sharing policy when signing the terms & conditions). These companies are all among the most valuable companies in the world, confirming the benefits of owning data.

These companies perform so well thanks to the contribution of data to machine learning. In fact, for many years scientists have developed increasingly complex models to improve accuracy but faced diminishing returns. However, feeding models with much larger datasets has shown performances surges. It is now commonly accepted that some machine learning methods (excluding those that evolve in permanence via trial-and-error such as Reinforcement Learning) algorithms are greatly improved if fed with thousands, millions and even more with billions of data.

The problem is that except for the few big players that have data and are able to convert these into value, accessing data is a major challenge for AI practitioners. This lack of availability inhibits startups to train their algorithms. Some data collections can be found (including some for free) online but represent only a small proportion of the world’s data.

To face this challenge, startups like DEX (www.dex.sg), BDEX (Big Data Exchange), and ExchangeNetwork have created data exchange platforms where companies as well as individuals can buy and sell data. On these platforms, data providers upload their precious resources on the exchange’s database which then grants access to data users. Data can be retrieved using various types of access (e.g. API, download, and web-interface) and formats (e.g. JSON, CSV, and reports). Therefore, this solution enables the transfer of data between entities and can be seen as a major step towards data availability.

DEX centralized exchange interface

However, centralized exchanges present drawbacks that impede scalability and massive adoption. The main reasons are related to centralization and can be found in other centralized systems such as banks that have led to the development of distributed ledger technologies (e.g. blockchains illustrated by Bitcoin for payments without intermediary).

The major problem of a centralized data exchange is that data are hosted by a third-party (usually the platform itself or service providers on the cloud); this is something firms are not comfortable with as leakage of some of their most valuable data can cause disastrous consequences for their business. As an example, Yahoo’s database was hacked in 2014 and precious information about its users have fallen in the hands of malicious parties (by the way, we still do not know how this database was hacked). As a consequence, firms and institutions are reluctant to use such services because the risk surpasses prospected gains.

Centralized systems are also inefficient as they cause delays, incur costs, and lack transparency as well as auditability. The latter is also an important feature for a proper data marketplace as providers need to ensure compliance (more on this issue below). There are numerous articles and books about the need for decentralization, as well as about when a centralized system still remains a relevant structure. I would advise readers to have a look at these to better understand the disruption which is currently happening with the blockchain technology. In the case of data marketplaces, privacy- and security-related risks are in my opinion by far the biggest problem of Centralized Data Exchanges. It is also important to notice that people have lost control of their data. Decentralized exchanges, by providing peer-to-peer exchanges, empower individuals to get back their ownership, control, and even monetize their data.

This is why the Ocean Protocol foundation came up with the idea of a Decentralized Data Exchange protocol that would be the substrate for building decentralized data marketplaces.

Ocean protocol is the result of the fusion between BigchainDB and DEX in March 2017.

BigchainDB is a Berlin-based startup founded in 2015 that uses blockchain technology to build scalable decentralized databases with interoperability as one of the core design guidelines, meaning that it can be accessed by various blockchain protocols but also other distributed ledger implementations such as IOTA’s tangle.

DEX is a Singapore-based centralized data exchange operating since 2015 with more than 250 data providers. Frequently in contact with these data providers, they felt the lack of trust in centralized databases and thus the need for a decentralized system, as explained in the interview below with Chirdeep Chhabra, DEX’s CEO, and Founder as well as Board member of the Ocean Protocol foundation.

As mentioned on their website, the Ocean Protocol Foundation aims at creating a Decentralized Data Exchange protocol to unlock data for AI. Marketplaces will not be governed by the Ocean Protocol Foundation nor will the data be hosted by them. In fact, data will be distributed, while being under the control of the data providers or data custodians, and in some cases encrypted so that there is no single point of failure. In addition, participants will not only be able to transact directly with each other but will also have the possibility of creating new data marketplaces using the protocol. In fact, the Ocean Protocol Foundation does not see the product as being one huge data marketplace but rather many marketplaces developed by users (e.g. a marketplace specific to the healthcare industry), the protocol acting thus as a kind of “network of networks”.

For further information about Ocean Protocol, you can refer to their website and several papers. The business whitepaper is already there (a nice 69-pages reading), providing an explanation of their project, an exhaustive description of the team and their background including BigchainDB and DEX work, a roadmap as well as extensive information about the token distribution. It also mentions the already numerous partnerships (see picture below). I would recommend reading it and stay connected as the technical whitepaper will be released soon. The whitelist for the token distribution will open on February 15.

Participating Agencies and Authorities of Singapore Government
Service & Technology partners

During my visit in Singapore, I had the opportunity to meet Chirdeep Chhabra, CEO of DEX, and Founder and Board member of Ocean Protocol foundation, to discuss the current state of the data exchanges ecosystem and in particular the contribution of Ocean Protocol foundation. Here is the transcript of our interview.

Could you tell me more about your background and how did you come up with the Ocean Protocol idea?

Following my master in distributed systems at Ecole polytechnique fédérale de Lausanne (EPFL), I worked at IBM research labs and later at ETH, Zurich, in what people now call the Internet of Things. Later, I studied at the London Business School and worked in multiple ventures in London, most often in the data field. At this point, in my environment there was no doubt about the potential of data anymore, it had become commonly accepted that it was greatly valuable for businesses. The questions had moved to how to create value from the data, how do we unlock their potential. Finally, I joined DEX and moved to Singapore which has the ambition to become the first smart city. DEX had thus started working with the government and several enterprises here to build a centralized marketplace for 4 years now.

One of the main problems in AI is the access to data: many AI companies came to me to connect them with people and organisations having these datasets. Actually, only a few companies have both datasets and machine learning algorithms (e.g. Facebook, Google). This is why we need some kinds of marketplaces to get access to data and enable transactions to happen. When I joined, I quickly realized that a centralized model was unable to scale. This is explained by the fact that entities would not give us their most valuable data for the simple reason that they cannot see what happens and then may feel that they lose control of their data.

As a consequence, I started to look at alternatives and especially how blockchain technology and tokens can contribute. I have known Trent McConaghy (founder of BigchainDB, co-founder of Ocean Protocol foundation) for a while so I contacted him in Berlin. I told him about the idea of data being converted into assets that are traded within a tokenized ecosystem. Trent was writing articles about that and we shared the same view so we ended up creating Ocean Protocol, together with other members.

I understand that Ocean Protocol is constituted of the DEX and BigchainDB teams. What are the roles and implications of each for the foundation?

We have a clear understanding of our strengths and so who is doing what. There are two elements: the protocol and the marketplace. Essentially, we act as a single team but within that team most of BigchainDB resources are focusing on the protocol and ours on the marketplace. We are building this marketplace in order to help users creating additional marketplaces with the open-source template.

How was the decentralization proposition accepted by your peers in the team? And by your clients and partners?

Within the team we are all very optimistic about it and believe that this complete change in direction is necessary. This new philosophy ensures that Ocean Protocol is built in the right way, with a network of marketplaces upon it. This is a design that is important for the development of safe and sustainable AI.

Concerning the second part of the question, we have been discussing that with many of our clients. Actually, last year we had a large workshop with a number of C-level executives, and Data and Privacy Officers, about data management and sharing. They understand the value of data but problems appear when it comes to understanding the mechanisms of data access, regulations and compliance. They must be able to provide a list of who accesses the data upon request by regulators. Transparency and immutability are important factors that complement the need for privacy and security of the data. Not having these characteristics fully operational was one of the biggest barriers before for DEX but there was much enthusiasm when we elaborated on the decentralization, trust frameworks and the Ocean Protocol proposition. Convincing companies that are already working with data to join has logically been relatively straightforward.

We also have meetings with other corporates, not traditional data companies, those producing data on a daily-basis but not using them. We try to convince them of the need for allocating more resources in AI, data analysis/business intelligence. As an example, firms need to predict both the supply and demand (e.g. if some types of crops will grow in the coming years, or the consumption of end-products). In addition, even if they produce more and more data, this is not enough to have accurate forecasts and stay competitive. They need external data for a rich insight and forecast. That’s why we need the marketplace where they can buy and sell data (it can also create new revenue streams) in order to complement the data they are producing. My conclusion is that if companies do not participate in the data markets they will be excluded from the future data economy and may be at risk of shutting down.

Is it a service that you will offer mostly to companies?

No, we don’t want to provide all the services ourselves. We are working hard on inserting inclusivity as a value for the design of the protocol. A marketplace based on a public blockchain can completely help democratize data access. We really want to not only benefit the big AI companies but also small ones, NGOs, governments and even individuals. If Ocean Protocol or other projects with similar goals fail, basically AI will be in the hands of a few people and this is in my opinion not good for humanity.

You have recently announced a partnership with the SingularityNET. How does it fit with your project?

SingularityNET is trying to make a marketplace for AI applications. But AI models and data have a very strong relation as AI needs data and reciprocally data are the most valuable when fed into AI algorithms. In addition, SingularityNET shares the same vision as ours: a vision where, like the internet, Artificial Intelligence does not belong to a few individuals.

You are working on a public blockchain. How do you see that in terms of scalability?

BigchainDB has built a scalable blockchain database. We have a history around that. Nevertheless, we understand that there are technical challenges and therefore we need to partner with other projects and scientists but as soon as possible also with the community using the open-source protocol.

In terms of growth, how do you see yourself penetrate the market?

We want to create a global community, having meetups all around the world. If there is a massive attraction in an area we will obviously respond by organizing specific events there.

One of the big advantages of Singapore is that we have already engaged with many companies and government agencies in the past. We hope to continue that and engage even more with other stakeholders here including AI companies, data scientists, SME’s and corporates. We believe that Singapore can reach its goal of becoming the first smart nation in the world with a good management of data. Engagement with government, companies and communities is promoted by the fact that people here have been very forward thinking in this field and they are happy to support what we are doing. Singapore is a very good test case for our project.

To expand to other countries and achieve more decentralization, we are partnering with PwC at the marketplace level in order to make sure to follow some compliance/regulatory structures that vary in relation with different jurisdictions. As an example, the EU will implement a new data regulation called the General Data Protection Regulation (see note below) which we need to make sure to comply with. It should not matter whether the marketplace is running in Germany or in Japan so we need to be cautious about different local regulations and make sure to take that into account in our design. I would even add that the Ocean Protocol is in line with this new regulation as it is exactly one of our objectives: enabling people to have full control of their data.

In terms of product development, we aim at coming with a first Minimum Viable Product by Q3 2018 and network launch by Q1 2019.

“The General Data Protection Regulation was designed to harmonize data privacy laws across Europe, to protect and empower all EU citizens data privacy and to reshape the way organizations across the region approach data privacy. Approved on 14th April 2016, it will be enforced on 25th May 2018 at which time organizations in non-compliance will face heavy fines”. Source: https://www.eugdpr.org

Does that mean that I could also sell my data?

Nothing would prevent you of doing that. However, at the beginning you will have no credibility on the network, so you would need to be referred or put stake (Note: put money at stake means buying and betting tokens such that if one’s data appeared to be false or not actually her, that person will lose her stake (and could even be blacklisted), quite similar to how proof-of-stake achieves consensus in some public blockchains). This is why at the beginning we are starting with those that have larger and valuable datasets. Nevertheless, we are building the token economy with in mind the purpose of not allowing any kind of centralization so of course it will be possible.

What would inhibit me as a big player to create a monopoly?

The rewards that one gets as a result of his data being very popular is logarithmic. Therefore, you cannot take over control as there are incentives for people to work with new data (because of the logarithmic curve). This mechanism ensures that people work, curate and bring new data. Price will have probably little to do with the popularity of the data. In any case, it is not our job to attach that. Data providers have the right to judge which price to set and rules are defined for the data marketplaces by keepers.

It is also important to understand that policies can change depending on the marketplace as they can be subject to different regulations and purposes. There may be some marketplaces specific to some fields like healthcare and energy. As stated previously, we do not think that there will be only one global marketplace.

What is for you an interesting use-case/industry for a decentralized data marketplace?

In my opinion, the most impactful one is healthcare. As an example, in the context of the Parkinson disease some companies are working on AI application to define the right scale of accuracy for tremors measurements. This input is then used to estimate the right dosage, duration and how often patients need to take the medicine. If the condition is not managed properly, they may need to have an implant in their brain which costs about 50.000€. This is a very expensive operation that more accurate machine learning predictions could replace. However, to get a low error rate, we would need 10.000 patient’s data. It is clear that no hospital can provide such amount of data, but a decentralized data marketplace can. Thanks to distributed ledger technologies, the sharing of patient data will be enabled but data will still remain with the patient or within the hospital. An algorithm that has been developed in Singapore could be sent to a hospital in Munich (after making sure that the data are formatted accordingly) for training and returns to Singapore without bringing back data. Moving algorithms is cheaper than moving data. We just need to prove that no data is pulled, which we believe is not difficult to achieve. In this case compliance and regulation are satisfied, the AI is trained and the impact is happening.

More information about Ocean Protocol can be checked on their website. They also have two telegram channels (chat & news) to stay updated. Finally, several related articles can be found on Medium and if you prefer videos here is their youtube channel.

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

This article is brought to you by the Bitcoin Center Korea. If you want to learn more about our activities and stay updated with Fintech-related news, please visit our website or our other social media:

--

--

Raphael Hannaert
Bitcoin Center Korea

Digital enthusiast, I write mainly about Blockchains and AI