ToF: Decentralization of AI for Web3



One of the primary concepts to web3 is decentralization. In this context, decentralization not only has technical meaning in the sense that the bulk (maybe all) of the necessary computations done in the backend of web3 applications are carried out through the usage of a decentralized network of nodes (as in p2p computing), but perhaps more importantly, holds less technical, more human connotations as well. Specifically, decentralization in this web3 context also refers to the idea of a wider distribution of ownership (e.g., of one’s data), decisions, influence, wealth, and benefits. In this ToF, we share our thoughts on how AI might fit (or be made to fit) the decentralized mold under both of these contexts.

1 The ability to build AI/ML algorithms shouldn’t be affected by decentralization. ML algorithms can be viewed as contracts (like functions) in the sense that so long as the data is collected and structured in the permissible ways, then they can fulfill their contract of building models (i.e., finding the parameters that best fit the data according to some metric). From that perspective, the applicability of ML algorithms (not their performance) are invariant to data collection mechanisms. This means that even if there are resulting differences to the data collection mechanisms in a decentralized setting (versus a centralized one), which I believe shouldn’t affect the structure of the data, then ML algorithms can still be applied (i.e., still fulfill their contracts). However, what would likely be variable is the performance of these ML algorithms under the different settings. One could reasonably suppose that probabilistic priors may change under a web3 setting, wherein it may be possible for people to truly consent to the usage of their data, and furthermore, be incentivized to supply high quality observations for the purpose of building ML models.

2 Distributed (or multi-party) ML has a large role to play in a decentralized setting. Frameworks for building machine learning algorithms under distributed-compute settings exist, and in fact, have garnered a lot of attention from academics and practitioners alike in recent years. Most notable is the Federated Learning framework, which permits the training of machine learning models when commonly structured data exists as chunks that are stored on separate compute nodes. In these cases, data cannot (or is difficult and/or costly to do so) be brought to a central node in order to facilitate traditional ML model building. Many IoT applications are structured in this way, and often represent a good use case for Federated Learning. Now, in converting existing web2 applications to become their web3 variants, there are situations that call for the data to be dispersed in this manner versus being merely replicated in the compute nodes of the network as in blockchain technology (note the common misconception that web3 is synonymous with blockchain). Ceramic is a web3 data ledger that works in these situations and provides the environment for Federated Learning algorithms to fulfill their contracts (see 1).

3 Decentralization and web3 will likely be met with the emergence of new AI/ML algorithms. The previous two points were more about how existing ML algorithms can be applied (quite naturally) in the decentralized, web3 setting. Of course, with any new technology and its resulting applications, there can be high potential for new interactions, which if logged in the appropriate datasets (and data models) can represent new opportunities for ML/AI. After all, ML is a field that exists through data problems, where we aim to build pattern-capturing algorithms that are accurate as well as efficient under the infrastructural constraints of its environment — new environments and constraints in web3 will birth new AI/ML algorithms, as in these cases, the existing ML is blocked from fulfilling their contracts.

The previous points were geared towards the ML algorithms and how they might fit or be eventually modified to service a decentralized environment from a technical standpoint. The next points however are about how we can think of using AI to decentralize ownership, wealth, and benefits. Or in other words, contribute to them being more widespread to the community.

4 Data ownership might mean the ability to decide for what ML models the individual/rightful-data-owner would want their data to be potentially used. One of the most important ideals of web3 and decentralization has to do with the ownership of ones digital data that may represent things like their personal profile as well as their interactions in the web. The argument goes that in the current internet (i.e., web2), Big Tech collects and sells your data for massive gains to advertising firms who then use your data to build intelligence such as training ML algorithms. Additionally, an individual’s digital data representing interactions with a service can also be used to enhance that service through things like better recommendations (e.g., user data that Netflix, Spotify, Amazon use to make better recommendations). In the web3 setting, more ownership of one’s data should mean that the individual is able to decide where they want to supply their data. Perhaps in web3, this idea manifests itself with the ability to “stake” ones digital data in certain data pools or the ability to select from a list of options for which ML models/applications they would want their data to be used (of course done in a privacy-preserving way).

5 To empower individuals to make decisions, more creative and productive forms of transparency of ML models will be required. In the previous point, we suggested that we can latch onto web3 principles in order to manifest a more true form of consent for individuals and how their digital data might be used, especially as it pertains to their usage in ML applications. It bears mentioning that the strive for such consent predated the web3 movement. Nevertheless, providing a more true form of consent relies heavily on providing the sufficient information to the individual in order for them to make the decision that’s best for them based on their own calculus. As it pertains to ML models, then it would seem that more creative and accessible forms of transparency would be required. We’re not talking so much about the explainable differences between linear models and neural networks, but rather, on things like what the model is specifically used for and on who.

6 Supplying data to ML models should be rewarded (e.g., in the form of crypto tokens). An important mechanism used to support and maintain many decentralized, web3 technologies is incentivization (for its participants). For example, in blockchain, validator nodes or miners are incentivized to support and maintain the validity of the network in return for the native crypto token. Along similar lines, users who supply their digital data for the betterment of a service or ML technology should be incentivized. Incentivization for ones data is not new of course (e.g., fill out a survey for a $20 Amazon gift card), but in web3, this can be done automatically and at a larger scale. A real world example of this is how one can use Brave browser and be rewarded in Basic Attention Token (BAT) for supplying browsing data/behaviours.

Consenting to viewing Brave Ads which are surfaced by intelligence built on ones browser behaviour data. In exchange for this data, one receives BAT directly in their Brave wallet.

Of course, contributing data in such a way shouldn’t come at the expense of risking the leaking of one’s sensitive information; and so, in this world, privacy-preservation techniques in AI/ML would still be heavily relied upon.

7The decentralization of AI and the search for its place within web3 represents an opportunity to revamp existing interfaces in order to enrich relationships between the general public and AI systems as they exist today. If anything, this point, above all the others, encapsulates the main takeaway message. That is, as we endeavor to create a new web under the concept of decentralization as well as other web3 ideals, we are presented with the opportunity to rethink (or question) the ways in which existing technologies, like AI, are currently being created, deployed, and used by society. Can AI be made more welcoming and less intimidating by giving people the sufficient information as well as choice to participate (e.g. through their supply of data) in the building of certain AI systems, and, would rewarding such efforts lead to a healthier, more productive relationship between people and AI? In doing so, could this lead to more opportunities to build even more accurate, safe, and trusted AI applications? These are but a few of the kinds of questions that we should be taking into consideration so as to not miss out on this opportunity to evolve AI and improve its relationship with society.