12 Areas of Convergence in Which Blockchain Can Foster Better AI
This is the second issue of Overtheblock’s Tech Convergence Series, whose aim is to explore the opportunities lying at the intersection of exponential technologies such as Artificial Intelligence (AI), Distributed Ledger Technologies (DLTs). In this post, we focus on how DLT may contribute to the growth of Artificial Intelligence.
The recent development in AI and the impetus to achieve its mass adoption are raising outbursts of fear and hope, fueling the emergence of new and challenging questions: is this fast technological progress driving us towards a dystopian reality governed by sentient algorithms whose intrinsic behaviour is obscure to the most? How can we limit the risks associated with the massive and uncontrolled use of sensitive personal data? Will Artificial Intelligence be able to scale up with the needed computing power to solve real-world problems? This post aims at presenting a fresh perspective on how Blockchain could address many AI shortcomings, helping to lay the foundations for a future research agenda on DLT for AI.
A convergence framework to unlock AI decentralization
In the previous issue of this series, we analyzed in-depth the technological stacks of AI and DLT. Then we showcased our Framework for Exponential Technologies Convergence, which develops from the intersection of the two stacks.
In this issue, we focus on the intersections where DLT can play as a game-changer for AI and discuss the several topics of convergence (Figure 2).
AI models sharing incentives
Data scientists and developers can share their AI models in a public ledger to earn cryptocurrency as remuneration. Tokens may be leveraged to generate rewards when the model is used or rated by others as well as to sell rights to future revenues that the model may generate. This approach can stimulate data scientists to increase their models’ quality and as well as providing funding opportunities for the development of promising new models [1]. Moving AI computations to the Blockchain and splitting it across several network’s nodes allows transferring part of the work to the community, rewarding participants for their works with tokens and incentive mechanisms.
On-chain AI
AI models can be uploaded to the Blockchain so that smart contracts and DApp can access and execute simple AI models (e.g. simple linear regressions, classification, or clustering tasks). This approach could help make the AI more transparent by decentralizing and open-sourcing AI algorithms to ensure that people could choose from multiple AI providers. The programmable blockchain platforms enable SCs-based programming models for decentralized AI applications, ensuring AI agents’ self-execution based on predefined terms and conditions [2]. An AI coded in a DAO, for example, can provide specific smart contracts that are allowed only to perform predefined and limited actions, and nothing more. In this way, DLT provides transparency and visibility of AI decisions to all participating AI agents on the network; hence it becomes difficult for AI agents to alter or refuse the decisions [3].
Data accountability & data provenance
Machine Learning and Artificial Intelligence models are created, trained, and used by different entities. The entity that curates data used for the model is frequently different from the entity that trains the model, which also differs from the trained model’s end-user. The end-user needs to trust the received AI model, which requires information about the training process steps and the data sources. Publicly auditable smart contracts deployed in a blockchain can be used to encode data usage policies and provenance tracking information in a privacy-friendly way, thus helping AI address data protection requirements concerning sensitive data processing, supporting data accountability, and provenance tracking [4][5].
Remote attestation on Trusted Execution Environments
It is essential to ensure that only genuine machine learning models are deployed. This means that the executed code has not been tampered with and operates as intended by the developer [6]. A DLT running on a Trusted Execution Environment (TEE) can be exploited in such a scenario. TEEs are secure areas of a CPU that provides an isolated execution environment for secure computations, enabling intensive tasks’ computation while preserving data confidentiality and integrity throughout the computation [7]. Inside this execution environment, DLT can generate a proof to the application running inside the TEE to attest that the software is executing on this specific, trusted hardware and provide a certification report stored on the ledger. This feature is called remote attestation.
Computational integrity
Integrity is a necessary criterion for trust in AI applications. Performing machine learning models’ training within smart contracts could ensure such computational integrity [4][8]. Of course, this should be implemented by computing smart contracts off-chain in TEEs, which could provide a relatively high computational performance needed to train machine learning models (mainly if implemented on GPUs) while preserving privacy and confidentiality. Notable current research aims at scaling smart contract executions outside a distributed ledger while maintaining integrity [9][10].
DLT-based federated learning for AI models computation
An ML model inference intended to personalize the content on a platform requires data about the user to personalize its recommendation for them. If such a model inference is executed locally, there is no need for the user to share their data while still getting a personalized recommendation. However, if no user shares their data with the platform, it is challenging for a platform operator to train ML models using traditional methods [11]. Federated Learning is a machine learning technique that trains the algorithms across multiple decentralized computational units enabling robust learning models without sharing data. It allows addressing critical issues related to data sharing and data security. In such a scenario, DLT can offer an infrastructure of decentralized computational nodes that is not vulnerable to inference attacks [12], providing an increased data confidentiality, AI model ownership, auditability of the training process, and in general, a more trustable AI [13].
Data markets and data monetization
In addition to the increasingly available computing power, another fundamental reason for the recent advancement in AI is the steady growth of available and digitized data. ML-based systems generally perform better the more data they are trained on, for example, classification accuracy [14]. The problem is that, to date, few big companies control large amounts of personal data, making money out of it, and hiding how they feed their algorithms. DLT-based data markets could democratize the access to such data silos [15], making it possible for the users to choose which data they are willing to share, actually owning their data and directly monetizing on it.
AI Pipeline explainability, traceability and audibility
Complex ML pipelines need to be transparent and explainable, allowing the end-user to understand how the final output is generated as a product of different pipeline functionalities. Explainability in ML pipelines can be approached from the user and system perspectives: the former means to understand what can be done to help users comprehend learned models and inspect their applications (i.e., explainable AI); the latter means to know how the learned models can be characterized and finally certified [16]. The training and validation process of learning models can pass through different refinement iterations to satisfy prediction performance. This evaluation is performed at each epoch using different evaluation functions (e.g., Loss Function). The DLT could be exploited to store on-chain the result of this evaluation function at each epoch of the training process and used as a validation and audit tool to certify AI models’ quality and improve their trustworthiness. Through Blockchain technology, immutable records of all the data and variables can be exploited by AIs for their decision-making processes, simplifying the entire auditing process. Recording decisions on a datapoint-by-datapoint basis on a Blockchain makes it far simpler for them to be audited, increasing the confidence that the record has not been tampered with.
Staking-based data sharing
A poor-quality input data for training machine learning models are not acceptable from any angle of view. Until the AI or ML model is fed in with the right data, they don’t give accurate results. Generate high-quality professional training data requires highly skilled annotators to carefully label the information like text, images, or videos, maintaining an in-depth quality assessment regarding unbiased decisions, testing, monitoring, and auditing [17]. Recent advances in DLT protocol studies led to the definition and implementation of programs for incentivizing a supply of relevant and high-quality data assets in DLT-based Data Markets within the protocol core. These systems are based on a reward function that pays data providers in proportion to the token liquidity staked on a dataset, representing the confidence in the quality of the dataset. The reward function reward is calculated according to various factors, from the volume of data consumption to the overall usage time. Some implementations, for example, allows the users earning tokens based on the portion of stake multiplied by the weekly usage of the dataset.[18]. Indeed, more consistent stakes and rewards are put on a dataset, more significant is the user’s usage. This approach can be an excellent yardstick for evaluating data quality in a decentralized manner.
Distributed data storage
As introduced above, the local computation of AI and ML models answers AI’s growing problems in recent years. The ever-increasing attention to issues concerning privacy and costs of the model training phase requests a revolution of the current centralized paradigm. Previously, we presented the recent studies that combine Blockchain with a federating learning methodology to resolve disputes between data privacy and ownership, exchange, and model privacy. However, more complex and experimental solutions are arising by exploiting these techniques, including decentralized storage and communication systems such as IPFS or Swarm [19][20]. A distributed storage module to assist the blockchain protocol is essential for distributing encrypted trained models using decentralized techniques as federating learning. It allows identifying and retrieving the models through a unique hash or fingerprint. Once notarized on smart contracts, it maintains a unique association between smart contract data and the model. The smart contract acts as a model training infrastructure and must be deployed when the encrypted model is uploaded on the distributed storage solution. The code governs the referenced model’s permissions and defines the data schema. It also records the N different datasets which are processed by the computational providers in N sub-tasks. The aggregation of these independent models generates a global trained model. Therefore, these actors can use the distributed storage solution to download the starting model and subsequently load the result of their training of that model. Thereby, they can be remunerated for this work based on their contribution throughout the entire training process [21].
AI models ownership
Most of the complex ML and AI models, such as deep neural networks, are nowadays often used in a black-box manner. The extant literature on DLT for XAI (explainable AI) covers data provenance or computational integrity aspects for model training or inference, as mentioned above. Despite this, some attempts to define DLT-based federated learning systems showed shared standard requirements and characteristics across these studies. The most critical issue is to ensure AI model ownership and track of use. Without it, it would not be possible to obtain confidential sharing on sensitive data, the described pipeline tracking process, and the possibility of resolving potential bias and model fairness issues [11]. Smart contracts allow creating ownership control and access only to those with specific information or rights. The need to guarantee the models’ ownership also applies to data, for example, in exchanging information in the health field with country-specific regulations and connecting multiple regulatory regions. Since patients have complete ownership of their healthcare data in many countries, a blockchain-based healthcare information exchange (HIE) system could make it possible to retain ownership of the data in cross-regional contexts implementing a patient-centric data model [24].
Proof-of-Useful-Work
During the last years, plenty of consensus mechanisms have been released, each with its pros and cons [22]. As we have mentioned in our previous post of this series, one of the most notable drawbacks is the energy waste derived from the Proof-of-Work (PoW), which dilutes the Blockchain’s value hinders its further application. The literature has proposed some approaches to fight this and other issues leveraging AI-based consensus mechanisms (which we grouped under the name Proof-of-Useful-Work). Furthermore, this approach entails a considerable advantage regarding the other side of the coin: the AI and ML models would obtain a computation and independent amount of units for the training process to realize models in less time and individual cost. Some examples worth mentioning again are Proof-of-Deep-Learning (PoDL) and Proof-of-Kernel (PoK). For the former, a valid proof for a new block can be generated if and only if a proper deep learning model is produced, and for the latter, only a subset of nodes participates in the hash computation. ML’s system governance parameters, such as the ideal number of miners or the mining difficulty level, are regularly updated [23].
Final takeaways
We have discussed and analyzed twelve application fields where DLT can potentially have an impact on the future AI’s ecosystem evolution. The study reveals a predominance of applications in the research areas where Smart Contracts meet AI models and Data Fusion, such as explainable AI and data management (e.g., data provenance, data marketplace, data monetization). Of course, the number of each of these topics has the potential to open an entirely new investigation strand aimed at addressing transparency and scalability issues of AI while preserving quality, authenticity, and ownership of data.
To summarize and look back at both sides of convergence, as evidenced by the coverage intensity of the blue spots in Figure 3, the intersection of the entire DLT stack with the AI layer dedicated to Algorithms & Models represents the area where the two technologies’ mutual contribution seems to generate the most significant number of potential application.
A number of projects in these areas need to mature to better predict where this convergence is headed and what are the most promising research and application fields. Nevertheless, we hope the reader could make use of these strategic insights to achieve that medium-to-long-term vision on the mutual evolution of these exponential technologies. The potential is limitless. It all depends on the capability of the next generation of innovators to build bridges among the DLT and AI communities and to open new paths towards the creation of a joint, multidisciplinary research agenda.
References
[1] W. Oscar, “AI on Blockchain — What’s the catch?”, 2018. [Online]. Available: https://hackernoon.com/how-cortex-brings-ai-on-the-blockchain-86d08922bb2a
[2] B. Xing, T. Marwala, “The Synergy of Blockchain and Artificial Intelligence”, 2018, Available at SSRN: https://ssrn.com/abstract=3225357.
[3] H. R. Hasan, K. Salah, “Combating deepfake videos using Blockchain and smart contracts”, 2019. IEEE Access,7, 41596–41606.
[4] K. Sarpatwar, R. Vaculin, H. Min, G. Su, T. Heath, G. Ganapavarapu, and D. Dillenberger, “Towards enabling trusted artificial intelligence via blockchain”, 2019, in Policy-Based Autonomic Data Governance. Cham, Switzerland: Springer, 2019, pp. 137–153.
[5] D. Preuveneers, V. Rimmer, I. Tsingenopoulos, J. Spooren, W. Joosen, and E. Ilie-Zudor, “Chained anomaly detection models for federated learning: An intrusion detection case study,” 2018, Appl. Sci., vol. 8, no. 12, p. 2663.
[6] Blockchain and Trusted Computing: Problems, Pitfalls, and a Solution for Hyperledger Fabric.
[7] D. Lee, D. Kohlbrenner, S. Shinde, D. Song, and K. Asanović, “Keystone: An open framework for architecting TEEs” 2019, arXiv:1907.10119v2.
[8] J. Eberhardt and J. Heiss, “Off-chaining models and approaches to offchain computations”, 2018, in Proc. 2nd Workshop Scalable Resilient Infrastruct. Distrib. Ledgers (SERIAL), pp. 7–12.
[9] A. Miller, I. Bentov, S. Bakshi, R. Kumaresan, and P. McCorry, “Sprites and state channels: Payment networks that go faster than lightning”, 2019, in Proc. Int. Conf. Financial Cryptogr. Data Secur. Cham, Switzerland: Springer, pp. 508–526.
[10] J. Teutsch and C. Reitwießner, “A scalable verification solution for blockchains,” 2019, arXiv:1908.04756v1.
[11] K. D. Pandl, S. Thiebes, M. Schmidt-Kraepelin and A. Sunyaev, “On the Convergence of Artificial Intelligence and Distributed Ledger Technology: A Scoping Review and Future Research Agenda”, 2020, in IEEE Access, vol. 8, pp. 57075–57095, doi: 10.1109/ACCESS.2020.2981447.
[12] L. Melis, C. Song, E. De Cristofaro, and V. Shmatikov, “Exploiting unintended feature leakage in collaborative learning”, 2019, in Proc. IEEE Symp. Secur. Privacy (SP), pp. 691–706.
[13] D. Preuveneers, V. Rimmer, I. Tsingenopoulos, J. Spooren, W. Joosen, and E. Ilie-Zudor, “Chained anomaly detection models for federated learning: An intrusion detection case study”, 2018, Appl. Sci., vol. 8, no. 12, p. 2663.
[14] D. Mahajan, “Exploring the limits of weakly supervised pretraining”, 2018, in Proc. Eur. Conf. Comput. Vis. (ECCV), pp. 181–196.
[15] G. A. Montes and B. Goertzel, “Distributed, decentralized, and democratized artificial intelligence”, 2019, Technol. Forecasting Social Change, vol. 141, pp. 354–358.
[16] Bernard Omidvar-Tehrani, Jean-Michel Renders, “Explainability Matters in Machine Learning Pipelines”, 2019. [Online]. Available: https://europe.naverlabs.com/blog/explainability-matters-in-machine-learning-pipelines/
[17] InsideBIGDATA, “How to Ensure Data Quality for AI”, 2019, [Online]. Available at: https://insidebigdata.com/2019/11/17/how-to-ensure-data-quality-for-ai/
[18] Trent McConaghy, 2020, “Announcing Ocean Data Farming”, [Online]. Available at https://blog.oceanprotocol.com/announcing-ocean-data-farming-26c036d12f20.
[19] A. ul Haque, M. S. Ghani and T. Mahmood, “Decentralized Transfer Learning using Blockchain & IPFS for Deep Learning”2020, International Conference on Information Networking (ICOIN), Barcelona, Spain, pp. 170–177.
[20] P. Ratadiya, K. Asawa, O. Nikhal, “A decentralized aggregation mechanism for training deep learning models using smart contract system for bank loan prediction”, 2020, Accepted at the Workshop on AI and Blockchains at the 29th International Joint Conference on Artificial Intelligence (IJCAI-PRICAI).
[21] L. Liu, C. Wu and J. Xiao, “Blockchain-Based platform for Distribution AI”, 2019
[22] Blockgenic, “Different Blockchain Consensus Mechanisms”, 2018, [Online]. Available: https://hackernoon.com/different-blockchain-consensus-mechanisms-d19ea6c3bcd6
[23] L.-N. Lundbæk , D. Janes Beutel, M. Huth, S. Jackson, L. Kirk, and R. Steiner, “Proof of kernel work: A democratic low-energy consensus for distributed access-control protocols”, 2018, Roy. Soc. Open Sci., vol. 5, no. 8, Aug., Art. no. 180422.
[24] G. J. Katuwal, S. Pandey, M. Hennessey, B. Lamichhane, “Applications of Blockchain in Healthcare: Current Landscape & Challenges”, 2018.
Please cite this post as follows: A. Favenza, G. Corrias, E. Ferro, “12 Areas of Convergence in Which Blockchain Can Foster Better AI”, Overtheblock Innovation Observatory, 2021. Retrievable at: https://medium.com/overtheblock/exponential-technologies-convergence-how-blockchain-can-enable-a-decentralized-ai-6bd89c4bae29
OverTheBlock is a LINKS Foundation’s initiative carried out by a team of innovation researchers under the directorship of Enrico Ferro. The aim is to promote a broader awareness of the opportunities offered by the advent of exponential technologies in reshaping the way we conduct business and govern society.
We are chain agnostic, value-oriented, and open to discussion.