Tutorial: Could communities reduce bias in AI by selling data on Ocean and get remunerated for it?
--
Problem
Today, the majority of data and AI development is held by a homogeneous group of people and either controlled by large technology companies or western ivy league universities. Without diversity and incentives beyond maximizing profits, comes a lack of empathy and imagination for the needs of society at large [How AI Fails Us: Divya Siddarth, Glen Weyl et al. 2021].
In addition, AI companies heavily overrely on datasets of data-rich countries, which are inherently unrepresentative of our population. It is not surprising that algorithms are biased and have led to discrimination in facial recognition [Gender shades: Joy Buolamwini, Timnit Gebru 2018], crime forecasting and mortgage evaluation applications.
Solution
Researchers and developers call for data equality and the decentralization of AI development. They argue AI development should be inter-disciplinary and it’s vision should be embedded in Human-AI augmentation, cooperation, decentralization and integrated Ethics [How AI Fails Us: Divya Siddarth, Glen Weyl et al. 2021].
How? Perhaps as a start, people could break down data centralization and control by large technology companies and start selling data privately to Machine Learning researchers through a data market place.
Here are three types of data that people might sell:
- Proprietary data, e.g. Browsing, location, audio data (Alexa, Google home) or exported health data (FitBit or AncestryData 23andme).
- Others’ proprietary data, (with rights acquired).
- Open data that people have added value such as labeling, annotations and especially bias checks will be valuable.
In addition, Machine Scientists struggle with accessing and purchasing affordable data sets outside of large tech companies.
Ocean Protocol is providing an Open-Source decentralized data market place where people can sell their data (Ocean Protocol is governed by a Singapore non-profit foundation).
Data is published as as an interoperable ERC721 data NFTs & with ERC20 datatokens. Data buyers can buy a variety of data sets and train their AI models on them.
Marketplace data transparency can provide easier routes to data bias checking. People could specialize in enriching data through bias checking and selling them. Calls for niche data sets could be used to further mitigate biased data sets.
It can be argued, that Machine scientists are buying more than just data. The marketplace could create better communication with a community of diverse set of people who are invested in the development of datasets, AI models and in return get remunerated for it.
Perhaps in the future, groups of people could create datatrusts or dataunions who represent and manage their members’ data within a legally defined data privacy framework. Members could eventually co-own AI models and get remunerated for the usage of AI model, too.
The data economy is still in his infancy but we can be hopeful that data transparency through market places could mitigate bias and decentralize control.
Below is a quick tutorial in how to publish your data on Ocean marketplace in 4min. Go ahead and watch this first. We will break it down later on.
Tutorial 1 How to publish your first Data NFT
Please note you have to prepare first, 1) a Metamask wallet, 2) test tokens e.g. Goerli Ether, OCEAN and 3) finally your dataset & sample data set url.
Steps by step guide
Go to market.oceanprotocol.com and select a test network e.g Rinkeby or Goerli. Login to your Metamask wallet and ensure you have corresponding test tokens, e.g. Ethereum and OCEAN tokens.
Click publish, it’s best to prepare your data name, description, sample set and tags before. Be specific in the description so that buyers understand the value of it. E.g. you checked for bias.
Now add the links to your data and the sample data (the data is not stored on the blockchain just the links) and verify each link
You can define your pricing, select either fixed priced or free.
Timeout, select “1 day”. This means the buyer can access the data for 1 day after purchase. If you often update your dataset this might be a good option.
Preview all the information and confirm it’s all correct. Then press continue.
Metamask will prompt you to confirm more transactions to create the token.
Data NFTs are ERC721 tokens represent the unique asset and datatokens are ERC20 tokens, to access data services. Each data service gets its own data NFT and one or more type of datatokens.
Congratulations you published you first data NFT for purchase.
Resources
Here some helpful links and videos for some prerequisite.
- Tutorial how to create prepare a sample data set on googledrive
- List of tutorials in how to create a Metamask wallet
- Overview of Ocean Academy and free courses about Ocean101, DeFi and Compute to Data
- Tutorial how to privately sell sensitive data
Sources:
- Gender shades: Joy Buolamwini, Timnit Gebru 2018
- How AI Fails Us: Divya Siddarth, Glen Weyl et al. 2021
- Ocean, Compute to Data: data privacy
- How to sell data in Ocean









