There Has Never Been More At Stake In Venture Capital

Fabric Ventures’ commitment to operate within the Ocean Protocol Network.

With deep thanks to my team at Fabric Ventures: Anastasiya Belyaeva, Julien Thévenard, Max Mersch and Richard Muirhead.

The role of Venture Capital in decentralised networks

One of the core narratives that drove the excitement around token sales in 2017 was the idea that token sales transcended the role of other fundraising mechanisms. In addition to raising capital, they were supposed to align the interests of the users, investors and the network: solving the chicken & egg problem that many networks face, by economically incentivising early adopters to use the network. One might think of them as a digital adaptation of the co-ops or building societies of the past, perhaps a stepping stone to a ‘modern mutual’. Unfortunately, the public fervour to buy tokens was driven by the speculative desire to re-sell them on exchanges rather than actually participating in their respective networks. As it turned out, trading activity and price grew, however network user numbers remained insignificantly low.

Blending our founder, operational and investor backgrounds, we strive to become partners, rather than investors for the projects we back. So given the necessity of bringing more than mere capital to the table, Fabric has begun the work to become an active participant in the networks we invest in: staking, validating, voting, curating and generally running nodes. Not only should it fulfil our fiduciary duty of maximising the financial return on capital for our LPs, but it also seeds the networks in which we are investing. We believe that going forward, most founders will initially turn towards patient institutional partners that inject both capital and work into their networks, and only once the networks are live with a minimum viable number of nodes, will they start attracting specific user groups through targeted placements or air-drops of tokens.

Introduction to Ocean

We are excited to share what we’ve been working on over the past months and coding since Ocean Protocol’s Testnet Version 0.1 release. Ocean Protocol is a network facilitating the peer to peer exchange of data and services (storage, compute and algorithms for consumption) aiming to open up the silos of unused data we operate with today. Data is generated at an increasingly higher rate across the world, but according to McKinsey, only 1% of IoT data is actually processed and analysed. Data producers (e.g. IoT devices) record and store data in proprietary silos; whilst entrepreneurs and researchers are left hungry for more data. A trustless worldwide marketplace to connect these parties will become a necessity in many sectors, ranging from healthcare (medical research with encrypted private data) to autonomous driving development and energy grid optimisations. The distributed nature of the Ocean Protocol network will provide censorship resistance, guarantee constant availability as well as prevent centralised security breaches.

Open decentralised marketplaces will show a much higher potential for collaboration than closed ones, unleashing exponentially greater value from data and models as they are continuously used and improved. A proprietary machine learning framework will not create a competitive advantage on its own; instead, the network effects derived from the services and people building on top of that framework will. Ocean incentivises other teams/projects to release in-house models and stake towards them in order to gain a higher position in the community/revenue distribution. By making Ocean a freely available framework of libraries upon which any developer can build a model, it has the best chance to become usable, flexible and scalable for its specific developer segment. As the go-to framework for such markets, Ocean will speed up the adoption of new models and ideas, further enhancing their application to business challenges. A recent parallel example would be Google open-sourcing its Tensorflow framework to the data science community, exponentially increasing the number of projects built on Tensorflow and creating an open platform for permissionless innovation.

Going forward, the Ocean framework will also enable the freedom to switch between different providers of models & data. As more models are released on the Ocean framework — incentives to engineer computing units specific to the Ocean framework will arise: where today we have TPUs for Tensorflow, we might have ‘DPUs’ optimised for low latency decentralised models.

A pilot for decentralised data publishing 101

A month ago, testnet v0.1 was released, which can be run locally or be deployed on Ethereum’s Kovan testnet using an Infura node. Based on the available alpha code published by the team, Fabric Ventures has launched a local Ocean Protocol test network. The current capabilities allow successfully publishing new data, scrolling through available datasets and downloading/consuming the data using either the Ocean’s front end website (Pleuston) or directly via a Python script which could be part of a regular ETL (Extract, Transform, Load) library or Jupyter notebook.

The first iteration of the testnet deploys the Ocean Protocol Ethereum smart contracts responsible for the marketplace and transactions, and enables dataset access authorisation and curation of assets (i.e. staking). In Python they are nicely wrapped as follows:

Acl, market and token are the smart contracts responsible for access requests, market transactions and ocean tokens respectively.

Python wrappers for Ocean Protocol smart contracts.

The first step after purchase initiation is verifying access to the asset, where an access request is initiated and it set to be confirmed within certain threshold period as below:

Asset access request initiation and listening to the network events for its status.

Once access is confirmed, the payment for the asset is released from the consumer address (buyer) to the provider address (seller):

Access approval and payment for the asset.

The voting/curation examples are not available in this release, however you can explore and play with the Ocean’s TCR (Token Curated Registry) smart contract which is available in the project’s repository.

The backend of the network is managed by a RESTful API based on Python and Flask controlling an asset database stored on BigchainDB powered by MongoDB and Tendermint. The API is the interface to discover available assets in the network and interact with Ocean’s smart contracts programmatically.

The data scientist can query available datasets by sending requests to the API:

Example API request to receive list of assets from Ocean network.

Which will return the full list of published assets (if you run the testnet you will need to publish some of yours first or run example scripts from Ocean’s Nautlina repository).

This is how Fabric’s dataset example JSON looks like:

Example dataset metadata.

“Pleuston” — a datasets portal acts as a more user friendly way to discover Ocean’s assets. It’s based on javascript and can interact with both the API and with the Ocean smart contracts directly. In future releases, the portal would read your Ethereum wallet address from, for example, MetaMask and personalise the page depending on your published or purchased assets — perhaps even recommend you new useful and/or cool new ones!

The Ocean Keeper

Based on these available features, we built a simple test case to explore what’s possible, and provide the community an introductory example of what the network can do. Introducing the Ocean Keeper: a set of bots gathering Github activity data and enriching it within the Ocean Protocol network.

To illustrate how data scientists and data engineers could use Ocean Protocol we built a simple data supply chain consisting of bots extracting raw data from example sources (Data Providers) and ETL pipelines purchasing those datasets (Data Consumers), processing and aggregating them into analysis-ready tables and publishing them back into Ocean for data scientists and companies to use for business purposes (final Data Consumers), as illustrated in the figure below:

Example dataset lifecycle within Ocean Protocol.

For this illustration, we’ve created two bots acting as separate actors within the network:

  1. The Data Provider captures raw data (in our case from tracking Github activity)
  2. This raw data is then stored on Azure Storage
  3. The Data Provider registers the data’s existence, storage location, price and payment address within the Ocean network
  4. The Data Consumer searches for available data sets on the Ocean network
  5. The Data Consumer selects the previously mentioned data set, and initiates a transaction to pay the Data Provider on the registered Ethereum address
  6. This transaction is validated by the Ocean network
  7. The Ocean network releases the hyperlink to the data set from Azure to the Data Consumer
  8. The Data Consumer downloads the data set from Azure
  9. The Data Consumer (in this case an ETL Provider) can enrich the data set, and publish it back to the Ocean network, effectively taking the role of another data provider in the network.

Practically, to create this simple ecosystem, we have reused most of the code already published by Ocean Protocol, with additional functions built for bots to interact with our own Azure storage as well as their process loops and data handling. We have developed simple objects representing the market behaviour presented above as the marketplace, data provider and data consumer — which effectively inherits characteristics of data provider to re-publish his work back into the network.

Example handers representing market agents.

OceanMarketplace provides an interface to the API to query current assets and execute transactions on Ethereum network on behalf of DataProvider and DataConsumer. DataProvider’s role is to reissue a new Github data every hour into Azure storage and let the Ocean network know that it is published and available to purchase. Data output:

Data Provider product published in the network.

DataConsumer listens to OceanMarketplace to see if new data has been published by DataProvider. It sends a request to OceanMarketplace to get all available datasets and checks if the one published by a specific provider is more recent than the last one — it knows its provider by its Ethereum address as it is attached to the dataset. Once new data becomes available by DataProvider, the consumer sends the purchase request and acquires the data. The new data is then enriched by aggregating weekly statistics of each author and then republishing the new asset back to Ocean network.

Data Consumer aggregated output.

Simultaneously both assets (raw and processed Github data) are easily discoverable by other users through the Ocean web portal:

Ocean Protocol dataset discovery portal — Pleuston

Our example implementation represents only a very small scope of what can be achieved on Ocean Protocol. We anticipate that, once more features and more assets become available to integrate, more complex data solutions can be built utilising this marketplace. The framework will also provide further efficiencies once staking and voting for the best datasets, models and compute services become available i.e. it could allow bots to systematically select the best source and the best data science solutions based on the on-chain usage statistics!

The coming year promises to be one of continued building & adoption as opposed to passive speculation when it comes to decentralised data networks and their native crypto assets. At Fabric we commit to becoming an active participant within the networks we invest in, and look forward to continuing on this steep learning curve.

With special thanks to Trent and Aitor from the Ocean Protocol Team for openly releasing this version and continued conversations and guidance on our implementation.