Open Data on the Blockchain

By Oliver Rydzi on ALTCOIN MAGAZINE

Oliver Rydzi
Published in
7 min readDec 6, 2018

--

I am certainly not the first one to talk about open data and blockchain, but hear me out. Because usually, when people talk about the open data on the blockchain, they focus just on one aspect of this issue — open data in government (public sector data). However, we should really look at the open data in a much broader aspect. Open data should really be any data that are common or public knowledge — i.e. business-related data (location, contact details, reviews…), a composition of some product (food, medicaments…) or even data that are published on the internet by users (articles, recipes…).

Problem?

Who should own data that are publicly available for anyone? Whether we are talking about business addresses, food compositions or recipes that are written all over the internet? Am I wrong to think this information should be freely available for the community?

Let’s say you started a startup, where you recommend users right food products according to their health needs. You will need to have a lot of data about compositions of these products to give your users the best answer. Currently, this leaves you with two options:

  1. Crawl the internet for the data you need or/and hire workers and let them fill in the information in your database (both are probably overkill, especially for a startup)
  2. Find data provider with the information you need

Let’s dive into 2. First of all, there is going to be more than one data provider and every one of them will have a little bit different data and API. Data provider A has something in data that you really need but misses something that only data provide B gives you. Second, the data is not going to be free. This will force you between some kind of compromise between quality/price. You will build a prototype and will be happy for some time — until you get traction and hit a lot of users. You might find out that you are missing something in the data (API) you are currently using, but you cannot really modify the database (API). On top of that, you might find out that you are paying a ridiculous amount of money for the API that you don't even like that much. This will make you reconsider and fallback to option 1. And quite possibly, once you build the necessary dataset and API, you will consider being data provider yourself.

Because of this, there is a lot of data redundancy between different data providers. They put a lot of effort and money to gather a lot of similar data and then sell it as a service with slightly different API.

Solution?

Smart contract as a data provider

Why don’t we use smart contract as a universal, community managed data storage, which will be complemented by an open sourced packages that expose smart contract endpoints to the developers in an easy to use way? We could establish something like “common knowledge API”, which would be contributed by community & built by the community.

This way we could achieve a lot of things:

  1. Kickstart development of applications that need this kind of data
  2. Build one universal, extensible dataset which can be contributed by anyone. Because the smart contract won’t be owned by one company, anyone could contribute and extend the dataset. If some company needs some special information from the smart contract, they can extend existing dataset by this information without the need of building the API themselves.
Figure 1 — What are the smart contract responsibilities & how parties interact with the smart contract; Sorry about the poor drawing quality :D

Now — if you read carefully, you should have some concerns. And maybe be a little excited.

Problems?

“Well Oliver, this is all well and good, but if anyone can add and edit data, how can we be sure that all data are relevant?”

This is definitely not an easy task to solve. First of all, each data entered into a smart contract will have to follow format of a predefined schema. This smart contract would have one schema for all areas that it would support. A schema is going to be either manually entered whenever there is a new area of data that the contract should support, or automatically generated from the data that are entered into the smart contract.

Figure 2 — Example schema for product composition — In English, this would mean that each entry in this schema would contain product name, who manufactured it, and an array of all the ingredients with their amount & units (i.e. entry in this array could be 2 mg of sugar).

This has another pain points. Who is going to verify that schemas entered into a smart contract are valid? And who is going to check that data make sense, even if they follow this schema? Just consider the example at Figure 2 — how can anyone know, that someone will not put “SADADASD” as an ingredientName?

Because of this, schema will first enter “proposal stage” before it is saved to contract. Schema will be in the “proposal stage” until it’s validated by a number of trusted users. If it won’t be validated — data & schema will be declined and won’t be stored in the smart contract. Otherwise, schema will be consider valid & data can be entered into it. These data would need to be validated by a couple/few trusted users as well before entering the blockchain (even if they match the schema).

But Oliver, trusted users? Really? Who are they going to be? This sounds like it won’t be decentralised at all!

Well, this was a major pain point. Basically, each user (address) on blockchain should have some ranking (according to various metrics) and if they reach ranking higher than this minimum, they could be classified as “trusted users” for purposes of this smart contract. Because of this, trusted users won’t be some group of 5 users that will do whatever they want and purposefully reject good data proposals when it suits them. This would be a community that could make choices around the smart contracts, where some choices (validating schemas) would need more votes than others (entering data into the schema). Side note — I was quite happy to learn that Nebulas (blockchain platform similar to Ethereum) already runs their own ranking algorithm, as explained in their yellow paper, which could be potentially used for this purpose.

“Oliver, it sounds great… But people will be lazy to fill in this public data. They’d rather use either ‘a’ or ‘b’ solution that you mentioned above. Dream on”

That’s why we need to incentivise people to work on this. I am not talking about “blockchain crazy” rewards, a.k.a. get rich quickly rewards. Let’s do it smartly — set up a system that will make sense long term. In my head, the solution could work like this:

  • The smart contract would have “bounty system”. Companies can pay bounties to the community for loading relevant data into the specific schema in the smart contract.
  • The company will always specify how high is the bounty (i.e. 100$, and pay it in tokens to the smart contract), for how much entries (i.e. 100), and in what format this data should be (according to which schema, for example, the one defined in Figure 2). They could also provide any other description regarding how should this data look like (i.e. just compositions of chocolate bars)
  • After let’s say 10 users fill in the required 100 entries, which would be validated both by business & a certain amount of trusted users, these 10 users are going to be paid accordingly (if all 10 users filled in 10 entries, then 1 user is going to get 10$).

This bounty system could be sponsored by the community in the beginning, maybe by blockchains that want to push real-world use of the smart contracts. I am strongly against tokenising this smart contract because I don’t believe it is necessary.

Conclusion?

I am aware that there is a lot to figure out, and that it might seem rough around the edges. I am working on a prototype of this smart contract in my free time, which might better illustrate all the use cases. If you think it is an idea worth pursuing, follow me on twitter or medium and you can look forward to the updates! Feel free to leave a comment as well :)

Yeah, and feel free to clap it up.

Shameless plug:

Twitter: https://twitter.com/oliverrydzi

I am part of a company called Webscope.io, where we prototype start-ups for a living. If you would like to consult your ideas or existing applications, feel free to drop us a line at hello@webscope.io.

https://altcoinmagazinemastermindevent.eventbrite.com

Before moving on, make sure to press follow, leave a clap or 46, share today’s highlight and if you missed the last article, click here.

Read about the Altcoin Magazine Mastermind Event here.

Follow us on Twitter, InvestFeed, Facebook, Instagram, LinkedIn, and join our Discord and Telegram.

The purpose of ALTCOIN MAGAZINE is to educate the world on crypto and to bring it to the hands and the minds of the masses.

--

--

Oliver Rydzi

Software Engineer at vidIQ | I develop complex applications for a living | Blockchain curious & I wrote a few smart contracts