Blockchain for AI: DApps for SMEs and Decentralized AI

Can Blockchain and Artificial Intelligence emerge for building better use-cases? Can SMEs use blockchain, IPFS, and DApps to improve their business model and operations? And how Blockchain is important to solve current Data Science problems.

Basem Dabbour
Game of Life
38 min readMar 30, 2020

--

Abstract

In the provided Document, I explained blockchain technology and built a mini backend blockchain database by using Ethereum development tools to build another layer on top of it (DApp). Use InterPlanetary File System (IPFS) acting as a middle layer for streaming, sharing, uploading data to the blockchain by using the DApp platform to store, retrieve and exchange data between peers on the blockchain. I will offer solutions regarding the use of blockchain as a backend database to empower more transparency, interoperability, secure and reliable AI applications in the future where 5G and IoT emerging success depends on solving data storage, integrity, security, scalability and transparency issues. Also to spot the lights on how Data scientists would use blockchain to integrate and share ML/DL models in DApp, in blockchain applications and proposing a solution for SMEs of using blockchain and AI together with an open future question about the current hot topic in the world, Decentralizing AI.

I. Introduction

Recently, in the last 10 years, Artificial Intelligence and Blockchain have become the two most major trendings, robust, promising and disruptive technologies in the world.

Blockchain technology has the ability to revolutionize the economy and cryptographically secure the distributed ledger that automates how we communicate in the world, a decentralized peer to peer network enabling technology for individuals and companies to collaborate with trust and transparency using consensus algorithms and protocols configured to govern interactions among blockchain network without a third party in the middle or central authority having control.

Blockchain automates payments in cryptocurrency (digital money) and also provides access to an immutable shared ledger of data, transactions, logs, and metadata in a completely decentralized, secure and trusted manner.

Moreover, the ability to come to an agreement between nodes by executing smart contracts that meet certain terms and conditions between those nodes. Blockchain provides solutions to many industries with different use cases, for example, it helps authenticate news from reliable sources and fight the wild spread of fake news or deepfake videos that could mislead the public opinion.

Artificial Intelligence or AI, on the other hand, offers intelligence and decision-making capabilities for machines similar to humans, models can be built, optimized and deployed by applying Machine learning and Deep learning algorithms using Neural Networks that can absorb a huge amount of data, disgust it and turn into the action of results to make a certain prediction. The AI market will grow up to 14 trillion dollars by 2030 and the results came after conducting a deep study on massive production and generation of data from mobile phones, sensors, IoT devices, social media and web applications. We as humans are good at specificity but horrible in sensitivity and vice versa for AI.

According to that, AI helps SMEs in automating many tasks and solving many world problems by correlating features of the data and diving deeply to find hidden patterns to reveal more information and understand insights behind the data for further decisions to take. However, nowadays, while we spend a lot of our time searching for datasets to build powerful models and data products for customers, we rely on the integrity of the data collected from many sources or servers around the world. In the same manner, the majority of ML/DL solutions both rely on a centralized model in which this model runs against different collected datasets from various servers for training, testing and validating to build AI models accordingly.

This is time-consuming and because the models learn from what we teach them to do or improvise by conducting self-learning from our daily mined data, the centralized nature of AI lead to the possibility of data tampering where our data can be subjected to hacking and manipulation before feeding it into our models which at the end may have catastrophic modeling results and the outcome predictions might taking place in our world could be subjected to be highly out of control, faked or altered, risky and dangerous. We already live in a world where centralized data is exposed to vulnerability, espionage, leak and becoming a trend or a lifestyle.

The concept of decentralized AI is to unlock the power of data for blockchain and AI that benefit from each other’s strengths and should be emerging in many SMEs use cases in the upcoming years.

While the mass adoption of Blockchain depends on the number of nodes or participants, scalability, security and privacy, AI depends on big data generated from humans and other machines, powerful libraries and algorithms to get the best out of the data for future reference, and security to make sure that the data fed into the model is not altered or tampered with. The primary contribution of this paper can be summarized as follows:

  • I give an overview of blockchain technology and some key features.
  • I explain Ethereum blockchain, how nodes work and building a private blockchain for SMEs use cases.
  • I discuss Ethereum development tools and how could be used to build DApps
  • I report and discuss using InterPlanetary File System IPFS as a middle layer between DApps and blockchain to store and exchange data between peers.
  • I identify and leverage blockchain features for future AI applications and how Artificial Intelligence and Blockchain the two technologies emerge for better SMEs use cases.
  • I identify Decentralized AI, provide a solution for the Data Science team and outline open research challenges regarding vulnerabilities or bugs in blockchain solutions.

II. Background

In this section, I give an overview of blockchain and Artificial Intelligence, also a comparison between types of blockchain and AI.

1. The problem of centralization

There are many problems could be listed but I will list 3 major problems that the whole world is facing nowadays:

It’s slow, the biggest institutional weakness is speed, institutions and their policy changes at institutional levels are slow hence it can take a lot of time to create new laws following hierarchical process and implement certain procedures in addition to consuming time to require approvals and multiple rounds of verifications for every relationship, contract, confirmations, and transactions.

According to report [6] from Accenture, most of the big banks still have systems that date back to the 1970s or even 1960s, so one of the examples is banking transactions times where transactions through bank transfer are taking few business days or even weeks or months to be cleared instead of a few minutes and this refers to outdated systems, internal policies and government regulations that require transactions to be examined and analyzed before being processed.

Blockchain technology could provide faster mechanisms due to the implemented design of the network. Since it’s based on consensus protocol [1], the community can address problems and make decisions to implement some changes to the system.

It’s expensive, traditional institutions are also very expensive. According to study analysis from LongHash [2], transaction fees and user subscriptions that users pay each month are a form of institutional expenses to cover the total cost of service are higher than crypto transaction cost. banks charge high fees to process wire transactions, exchange or convert currencies and even manage users' accounts. Insurance agencies require administrative fees. Online retailers charge credit card transaction fees for any purchase. In fact, any business will be flooded with fees to pay everywhere. Instead of a central authority or third-party agencies charging a high fee for verifications, blockchain provides cost-effective to communities and society where smart contracts and transactions can take place on a shared network and connected nodes or users can pitch in to verify transactions of others.

It’s subject to attacks, day after day, cybercrimes are on the rise and it’s now common to hear about major institutions getting hacked [3] or having personal data breaches and it’ s not only affecting the credibility of those institutes but also the public as well. centralized systems that its servers store our private data will be exposed to bad actors looking into violating the network and doing serious damage. vulnerability in centralization and the institution’s inability to protect users and their data is one of the reasons trust in institutions is declining and demanding more protection or alternative solutions to the problem to put the data in the hands of its owners instead of a centralized server that has huge sensitive records on all. EU created GDPR for small and medium enterprises to regulate how businesses handle data and to protect personal information designed for individuals to have full access and control over the use and maintenance of their own data which means GDPR might clashes with the immutable nature of any other problem-solving technology like blockchain, while both share the same goals for the protection of data. The biggest hack was in 2017 on US company “Equifax” [4], the hackers managed to steal credit card information, social security numbers, full names, address, and payment history for around 143 million consumers and got fined [5] recently around 700 Million Dollars due to lawsuits. Cambridge Analytica used an indirect hack [6] approach by buying information from a developer who connected his quiz application to Facebook API that guarantees access to personal data of 87 million users and sold them to the company for using AI models to analyze preferences and online behavior patterns for political reasons. Recently a secretive company named “Clearview‘’ violated social media terms & conditions along with users’ privacy by scraping over 3 billion images from Facebook, Twitter, and other centralized platforms to build facial recognition models and sold their software to many law enforcement and third parties [7].

Trending Cryptocurrency Hub Articles:

1. Beyond Crypto. Applying Blockchain to Different Spheres of Life

2. Beginners to experts, the five must know from the crypto master- Mohsin Jameel

3. Is Groestle Coin The Next Digital Gold?!

4. A Crypto that will Pay You

Blockchain technology uses multiple layers of cryptography to protect user information, but some blockchains are more secure than others taking into consideration the infrastructure, total number of nodes attached to the network, architecture and consensus algorithms to ensure privacy and security. However, even if there were a breach of one piece of information for one node or user personal metadata, the hacker will not gain access to anyone else’s information in the process since each relationship, contract, and the transaction is individually encrypted. Blockchain is robust in increasing the level of security, but it’s not 100% secure and nothing is. A double-spending problem [8] is one of the common issues where attackers will target high-value transactions to use the same amount for more than once at the same time like “buy one and get another for free!” but there is no promotion or offer to justify this action in this case. Also the more dangerous attack 51% [9] or Majority attack where the attacker will get control over the entire blockchain network by overcome the computational power of the entire network reverse the blocks and fork the chain to become the new master that has the full authority to manipulate and change consensus rules or even shutting down the whole network.

On the other hand, companies that have the resources to develop, maintain, train and optimize AI solutions relying on centralized models to build their data products are likely to increase the gap between them and the rest of the market. While developing any AI models depend on the amount of data, the current centralized business model and nature of AI introduces a “rich get richer” [10] which only big companies that dominate the market have access to large and labeled datasets stored on their centralized servers.

AI Centralization Vectors
Fig. 1 AI Centralization Vectors

Many SMEs left far behind the large companies due to the lack of data availability and sharing. The result will be not only an intelligence problem and a fight to survive in the market but also a data problem.

The Data Problem, Large companies don’t have great mechanisms for sharing data with the data science community or even give the opportunities for data owners the decision to contribute their own mined data with the right security and privacy guarantees.

The Model problem, model build out from feeding data to it, the larger the preprocessed dataset the better the model will be and this leaves a question not only about the small amount of data that data scientist has in hand due to data centralization problem, but also open many questions about data integrity and authenticity if the data was fake or tampered with which is a high-risk scenario for fully automated solutions, then the whole model is built on fake information which might lead to catastrophic results and biased output. With 5G implementation, Imagine a fully autonomous and driverless Tesla car functioning on streets doing wrong predictions based on tampered measures of data streamed between cars, or IoT solutions and cutting edge technologies to connect all devices. the possibilities of having high risk in revolutionizing the current industry to accelerate the development and implementation of those solutions increase linearly and exponentially by keeping the same existing centralized AI models.

The Training problem, training of the models is done by the same group that created the models in the first place which might lead to a model that lacks data experience, biased and overfit very frequently. Sharing the data, delegating the build and the train of the models into a decentralized network of data scientists that will operate to improve the quality will reduce the regularly hurt of centralization and the possibilities of Cambridge Analytica [11] or Clearview [12] scenarios from happening again and empower the communities to contribute and be part of building the future.

TABLE 1. Blockchain and Artificial Intelligence.

2. Blockchain

The 21st century is all about new technologies. With the increasing need for modernization in our day-to-day lives, people are more open to accepting and adopting new technologies. From using a remote for controlling devices to using voice notes for giving commands and Artificial Intelligence to autonomous solutions. Technologies like augmented reality and IoT have gained pace in the past decade. Deep learning model approaches that improve many applications, provide innovative solutions and solve some of the root problems in our life. Now there is a new additional player in the field added to the pack and it is Blockchain Technology which revolutionized the financial systems, impacting and disrupting many different industries around the world.

Back in 1964, Paul Baran presented a diagram [13] illustrating the difference between distributed and decentralized systems while he was working on creating a robust and nonlinear military communication network. The diagram raised some confusion since the two concepts have similarities of how both really works, but not the same thing.

Fig. 2 Flowchart diagram for simple transactions on Blockchain.

A common example could be used to explain the outer image of how blockchain can change our fundamental rules with a peer-to-peer decentralized network is “Family financial book”. An example of 3 different scenarios could explain the difference.

First, a modern family has one ledger book to store all member’s expenses and it’s located in a specific location within the house and only the father has access to it to register all expenses. In this case of centralization, the father pruned to make mistakes or human error like forgetting to add a transaction (payment), since it is only one original ledger, the payment book could be damaged or lost for some reasons and data stored within will be lost forever, or naughty children will find the secret location of that book and alter some data within such as adding Playstation games or piece of luxury or even delete some logs by tearing some pages apart to cover things that might most kids want to hide from their parents.

Second is to use multiple central owners (i.e database servers), each of which usually stores a copy of the resources users can access, in the provided example both parents, the father and mother are central owners (High availability HA, Active-Active) forming a distributed system for the home ledger to keep one copy with each of the parents to provide high availability and fast access to relevant data from different locations, and that is designed to be more tolerant to faults which addressed in centralization where if the first central owner fails, in our case the copy owned by the father, the others or the mother can continue to provide data access. But it has security issues and risks, it is still vulnerable against the same children that might hack both copies in order to revise or make the same changes on both, this will bring us back to the centralized main problem. Distributed systems are less likely to fail, have better and faster performance, and allow for a more diverse and flexible system than a centralized one. However it can be just as vulnerable to crashes as well, and never to mention the high cost of maintaining the network.

Now Third, imagine that the parents have agreed with neighbors, friends, and work colleagues to join the decentralized peer-to-peer network as central owners for participating in the flow, the financial ledger will be replicated among all participants or nodes of the network and keep synchronized and harmonized copies decentralized and nodes connected to one another, this will make it much harder for children to revoke all decentralized copies to make the change, and if some changes did happen to some copies, the central owners would check and evaluate with other owners to validate their own copy of the public shared ledger and trigger where exactly the changes happen and on which page (block) from that financial book or shared ledger has been removed or altered. Despite the level of difficulty to make real changes in a decentralized ledger, it’s still possible and all that those naughty children need to do is to know 51% of the central owners that share the same public ledger, how to access their copies and change them. This will create a breakthrough point where 49% of the remaining central owners will be forced to admit the fraud action and triggered changes at their end due to a 51% majority of participants or voters, and update their records accordingly to be on the same page, the children page. Blockchain technology classical definition and In the simplest terms can be described as a data structure that holds transactional records and while ensuring security, transparency, and decentralization. A time-stamped series of immutable records of data that is managed by a cluster of computers that are controlled by no single authority. is a chain of blocks or records stored in the forms of blocks that chained together cryptographically.

A blockchain is a distributed ledger that is completely open to any and everyone on the network and allows all the network participants to reach an agreement known as consensus. Once a piece of information is stored on a blockchain, it is extremely difficult to change or alter it where each transaction on a blockchain is secured with a digital signature that proves its authenticity. Due to the use of encryption and digital signatures, the data stored on the blockchain is tamper-proof and cannot be changed.

2.1 Types of Blockchain

There are 4 different types of blockchain: Public, Private, consortium, and Hybrid [15, 16, 17]

2.1.1 Public or Permissionless Blockchain

A Blockchain for the public is designed to securely cut out the middleman in any exchange of asset scenarios. It does this by setting up a block of peer-to-peer transactions where each transaction is verified and synced with every node affiliated with the blockchain before it is stored inside the shared ledger and written into the system.

Anyone can be a node to join the network and thus help maintain and have the chance to append the ledger, also Anyone can send and validate transactions that are viewable by anyone on the network.

2.1.2 Private or Permissioned Blockchain

A Blockchain designed to an individual or to an organization. it’s completely permissioned. To join the network, one has to take permission from the network administrators.

Access rights are different for everyone depending on the designation of the user, some have full access and others have restricted access. Private blockchain does not offer the same decentralization benefits as its public counterpart, it is similar to a centralized system but the only difference is inheriting blockchain advantages.

Since its private, all transactions are their verifications are carried out internally, because of a good chance of security violation, private blockchains are more by controlling which one should join or not and this will lead to reducing the number of potential malicious actors in private networks compared with a public one.

On the other hand, the company can choose who has read access to its blockchain’s transactions, allowing for greater privacy than a public blockchain. A private blockchain is a better fit for more traditional businesses.

2.1.3 Consortium or Federated Blockchain

An amalgamation blockchain of the public and private blockchain. In this type, the power does not reside with a single entity. It runs under the leadership of a group of entities. Essentially its permissioned blockchain allows developers to regulate certain aspects of the blockchain such as write and append privileges and allows certain aspects to be public such as read/receive privileges.

Consortium blockchains are quick and provide higher scalability and transaction security and that’s why it’s widely used in the banking sector. The consensus mechanism is managed by a preselected set of nodes. These nodes can be from all the entities forming the consortium.

2.1.4 Hybrid Blockchain

The Hybrid blockchain is a mix of public and private blockchain by employing a public blockchain to make the ledger accessible to every single person in the world with a private blockchain running in the background that can control access to the modifications in the ledger. A blockchain that attempts to use the best of both premissioned and permissionless blockchain solutions, it means controlled access and freedom at the same time.

Hybrid blockchain is not open to everyone, that means it is customizable and the members of the hybrid blockchain can decide who can participate or which transactions are made public for verifiers while at the same tile still offer the same blockchain features such as immutability, integrity, transparency, and security. This brings the best usecases to companies that can work with their stakeholders. participants that are granted access to the hybrid blockchain will share equal rights to do and confirm transactions while keeping their identity private and secure of the other participants to protect user privacy.

The identity of the user can reveal only when a direct transaction is carried with the other user that they are dealing with and this is done by companies and organizations that carry out the KYC process to make sure that transactions can’t be carried out by anonymous user that are not entirely known to the hybrid blockchain network.

Hybrid blockchain could be the best solution for SMEs or even big organizations that will use secure background transactions with business partners while also sharing product information with consumers on an open ledger.

3. Artificial Intelligence

Artificial Intelligence is the broader image and concept of a set of machines being able to carry out tasks smartly due to certain commands and rules programmed accordingly and train a computer or the machine to perform certain tasks. Machine Learning is that the computer or machine learns itself based on the past data with the help of mathematics, statistics, and probability to create a model that helps in predicting what will happen in future while keeps on learning from new datasets by using different algorithms to parse datasets, learn from them and take decision-based of that. Deep Learning is a subset of Machine learning, it comes out because of having a huge amount of data that requires a model to teach itself based on using deeply artificial neural network layers that function in the same concept of human brain cells. Data Science is a combination of AI, ML, DL, Mathematics, and statistics.

4. Ethereum Blockchain

The Ethereum blockchain improves the design further. Ethereum blockchain shares many common elements with other open blockchains. A peer-to-peer network connecting participants, a Byzantine fault-tolerant consensus algorithm for synchronization of state updates (a proof-of-work blockchain), and the use of cryptographic primitives such as digital signatures, hashes and digital currency (ether).

In Bitcoin, the reference implementation is developed by the Bitcoin Core open source project and implemented as the bitcoin client. While in Ethereum, rather than a reference implementation, there is a reference specification. Transactions are hashed in a more compact and efficient structure called Merkle-Patricia trei.

Moreover, the block header which generated as usual by the miner also contains in addition to the Merkle-Patricia root of the transactions, the Merkle-Patricia root of the receipts, which are the transaction outputs, and the Merkle-Patricia root of the current blockchain state. Every block header in Ethereum contains not just one Merkle tree, but three trees for three kinds of objects:

  • Transactions
  • Receipts (pieces of data showing the effect for each transaction)
  • State

Those trees will allow for a highly advanced light client protocol that allows light clients to verify answers to many queries:

  1. Has this transaction been included in a particular block?
  2. Tell me all instances of an event of type X (eg. a crowdfunding contract reaching its goal) emitted by this address in the past 30 days
  3. What is the current balance of my account?
  4. Does this account exist?
  5. Pretend to run this transaction on this contract. What would the output be?

The first four are straightforward to compute by finding the object, fetches the Merkle branch (the list of hashes going up from the object to the tree root) and replies back to the light client with that branch. the (1.) is handled by the transaction tree, (2.) by the receipt tree, (3.), (4.) and (5.) are handled by the state tree.

Fig. 3 An Ethereum improved block header and body

The trie or the prefix tree is an ordered data structure to be used to store a dynamic set, where the keys are usually strings. The root of the trie is an empty string and then all the descendants of a node have the common prefix of the string associated with that node. The Merkle-Patricia trei is a data structure that combines a Patricia Trei with a Merkle Tree. It improves the efficiency of MT by storing the keys using the PATRICIA algorithm to retrieve information coded in alphanumeric [18]. In Ethereum, a special hex-prefix (HP) encoding is used inside the block, so there are 16 characters in the alphabet and of them is a nibble (1 nibble = 4 bits).

5. Decentralized Application (DApp)

A Decentralized Application (DApp) is an application that is completely decentralized. To achieve this, the following aspects of an application should be met:

  • Back-end software, such as smart contracts are used to store the business logic of program code
  • Front-end software, the client-side interface of a DApp can use standard web technologies (HTML, CSS, JavaScript, etc) interacting with Ethereum such as MetaMask. The front-end is usually linked to Ethereum via web3.js JavaScript library bundled with resources and served to the browser via the webserver.
  • Data storage, that can be centralized such as typical cloud database, or decentralized and stored on Peer-to-Peer platforms such as Interplanetary File System (IPFS), or Ethereum’s own Swarm platform or Swarm which is another content-addressable Peer-to-Peer storage system similar to IPFS build-in within Ethereum by Ethereum Foundation as part of the Go-Ethereum suite of tools and other projects such as Storj, FileCoin, SiaCoin [19]. But IPFS is widely used in the development market since it is the backbone of the third Web (Web3.0) [20].
  • Name resolution.
  • Message communications.

Each of these aspects can be somewhat centralized or decentralized or semi of both. a front-end can be built and developed as a web app or mobile app that runs on the centralized server or device. While the back-end and data storage can be on private servers, databases or smart contracts and Peer-to-Peer storage.

There are 3 main disadvantages to creating a DApp that traditional centralized architecture can not provide [21]:

A. Resiliency, the business logic and full process controlled by a smart contract, a DApp back-end will be fully distributed and managed on the current blockchain platform that builds on top of the blockchain database itself. Unlike deploying an application on a centralized server, a DApp has no downtime and will continue to be available, up and running as long as the platform is still operating and the blockchain is still working. DApp is more secure against DDoS attacks.

B. Transparency, The nature of the decentralized application allows everyone to inspect the code to make sure about its real functions and any interaction with DApps will be permanently stored inside the blockchain.

C. Censorship Resistance, Nowadays most companies juggernauts are installing heavy tracking tools to follow users and store private data on a promise, on a centralized server that is only managed and controlled by their experts. However in DApp, as long as a user has access to an Ethereum node and will always be able to interact with the DApp without the interference from any centralized control, service provider or even the owner of the DApp’s smart contract. No one of the mentioned will be able to alter the code once it is deployed on the network. P2P network will manage to run the DApp that even if one node or more got hacked or shut down, the others will keep the DApp live and running.

6. InterPlanetary File System (IPFS)

InterPlanetary File System (IPFS) is an open-source project, a decentralized content-addressable storage system that distributes stored objects between peers in a peer-to-peer network. Content addressable in way that each piece of content (eg. file, image, video…) is hashed and the hash is used to identify that particular file

among many shared on the network. Any file can be stored and then retrieved from any IPFS node by requesting it by its hash. IPFS has benefited from multiple forerunner technologies such as distributed hash tables (DHT), BitTorrent, git, and SFS and was inspired by these technologies to provide enhanced solutions for hypermedia data sharing [22].

Fig. 4 Comparing data sharing in IPFS to centralized client-server models.

IPFS aims to replace HTTP as the protocol of choice for the delivery of web applications. The alternative approach of storing a web application on a single server will be to store files on IPFS that can be retrieved from any available IPFS node instead of a centralized server.

Fig. 5 Storing a file in the IPFS Peer-to-Peer network.

IPFS is an important component of the Web3.0 infrastructure. Web3.0 that aims to replace the current centralized internet infrastructure with a decentralization one. However, it is very expensive to store any other kind of data into the blockchain. Thus IPFS is the most suitable storage medium and shares data in a decentralized way. IPFS allows for distributed storage of data that is immune to altering, hence the data storage on the IPFS network cannot be altered without changing the data identifier which is a cryptographic hash of the data “Qmsx….” and that’s why IPFS is turning into a preferred storage platform for DApps.

TABLE 2.1. Comparisons among Public, Private, Consortium and Hybrid Blockchain.
TABLE 2.2. Comparisons among Public, Private, Consortium and Hybrid Blockchain.
TABLE 2.3. Comparisons among Public, Private, Consortium and Hybrid Blockchain.
TABLE 2.4. Comparisons among Public, Private, Consortium and Hybrid Blockchain.

III. Challenges for Decentralized AI

Fig. 6 Current common solutions for Decentralized AI.

There are many problems to be addressed, questions to be asked and a lot of challenges to achieve a decentralized AI business model.

The following is an available group of technologies and techniques that can be used to trigger decentralized AI 4 problems and offer solutions accordingly:

1. Runtime: The underlying infrastructure and the preferred runtime for decentralized AI solutions are blockchains (Distributed Ledgers and smart contracts) that will establish consensus rules and mechanisms between participants in a transparent and scalable way.

2. Data Privacy: symmetric and asymmetric encryption can be useful for decentralized AI usecases, however, it clashes with privacy concerns that require a full trust between parties and exchange keys used to secure the communication which does not offer the best level of privacy and security while anyone could decrypt the data and grant full access to sensitive information. For that, many decentralized AI systems and projects started to adopt some of the most advanced cryptographic techniques that came out from academic research to apply and implement them in their own solutions to achieve a high level of privacy, security, and efficiency. There are 3 important security and privacy methods:

2.1 Homomorphic Encryption:

Homomorphic Encryption Allow sharing data with different parties while maintaining the same levels of privacy by enabling the execution of AI models (e.g Machine Learning models) against encrypted dataset without decrypting it, certain computations to be applied on ciphertext (user data) without knowing the private key, and that will lead to producing an encrypted result (model result) which is also in ciphertext you, however, using homomorphic encryption are very expensive to be implemented and to run computations on homomorphically encrypted data.

Fig. 7 Homomorphic Encryption technique on a dataset.

Homomorphic Encryption can perfectly read numbers, text or dataset and turn it into something gibberish using a public key and use that gibberish and turn it back to the same number, text or dataset using a secret key which can’t be decoded without it. So it will allow different nodes (e.g data scientists, researchers ..etc) to modify the encrypted information in specific and apply some AI tasks without revealing or reading the information itself.

2.2. Generative Adversarial neural networks — GANs Encryption:

GANs cryptography was pioneered by Google in 2016 research paper under the title “Learning to Project Communications with Adversarial Neural Cryptography” [23], a proposed method to use a neural network to discover new forms of encryption and decryption to protect communication threads and channels from hackers. Alice sends a plaintext P to Bob and Eve and shares secret key k with Bob to hide communication from others such as Eve.

Fig. 8 Generative Adversarial neural network communication.

GANs will train Alice and Bob to communicate successfully while learning that Eve is an eavesdropper trying to spy on the conversation or commit illegal actions. In the same analogy to Decentralized AI, GANs cryptography can allow different nodes (e.g data scientists, researchers ..etc) to exchange secured datasets and machine learning models that are censorship resilient to the most sophisticated attacks.

2.3 Secure Multi-Party Computations (xMPC):

xMPC provides a cheaper alternative to homomorphic encryption and has the same characteristics of allowing different nodes (e.g data scientists, researchers ..etc) to express and apply mathematical operations and computations without having access to the underlying data. Enigma deployed xMPC in its blockchain.

3. Federated Learning:

Federated Learning [24]: Traditional machine learning programs relied on a centralized model for training in which a group of servers runs a specific AI model against training and testing datasets. The centralized training can work pretty good on many current usecases and scenarios, but it’s also challenging when a large number of nodes (e.g data scientists, researchers ..etc) use and improve the model continuously and the limitation as already discussed in this thesis is when 5G, IoT and advance technologies that connect many cutting edge devices with each other where these data need to be interpreted and processed for model optimization and regularization.

In federated learning, each individual endpoint or user can contribute to the training process of a machine learning model in its own autonomous way by providing their information to the model to develop a federated knowledge comprehended from many shared data by endpoints.

Federated Learning proposed and pioneered by Google researchers in two papers [25, 26] published in 2017 as an alternative solution for centralized AI training in which a shared global model could be trained under the coordination of central server, from a federation of participating devices as per the following steps described by google that started using a federated new technique in its vision API, the Google Natural Language Process (NLP) for auto-correction and predicting the next word the GBoard app and also working on a TensorFlow version that can run on a mobile app to support federated learning models:

  • A subset of existing clients is selected that each of which has the current AI model
  • Each client in the subset of clients computes an updated model based on their local data
  • The model updates are sent back from the selected subset of clients to the server.
  • The server aggregates these pieces of model updates to construct an improved global model by using the FederatedAveraging Algorithm known as FederatedSGD where each client with local data takes one step of the gradient descent at the current model and contributes to the model optimization process.

IV. AI and Blockchain

Blockchain Technology could emerge with Artificial Intelligence to revolutionize how organizations work in manufacturing, transportations, retailing, finance, entertainment, arts, education and more.

The existing AI market is increasingly controlled by the tech giants and social media juggernauts such as Facebook, Google, IBM, Microsoft, Tencent, where all of them offer centralized or cloud-based AI solutions and APIs and their current business model give a little control of users over their private data and on AI products which tends to lead to monopolization of the AI market and full seize all the competition opportunities that cause unfair pricing, misuse of consumers data, lack of transparency, interoperability and also limited the participation of startups and smaller companies in AI innovation sector.

1. AI for Blockchain

Artificial Intelligence could be used in many ways to assist and help in implementing blockchain technology or even help in guarding the database and its smart contract from any hacking or manipulation by analyzing incongruous Behaviour in the transaction network.

The following aspects to describe what AI could do to address some of the blockchain issues or improve other characteristics [27]:

  • Sustainability: AI can help to reduce tons of energy and computation power for mining blocks and it has been proved to optimize the energy consumptions and cooling for many Datacenters such as DeepMind AI of Google that used machine learning models to reduce energy usage in their Datacenters. The same analogy could be applied to blockchain since it uses computation power to mine blocks or even confirm and validate transactions [28].
  • Scalability: one of the obstacles in having scalable blockchain is the size of the total ledger stored on a single node or computer which might be in GBs. 1MB size of the mined block every 10 minutes, so 10 years since launching blockchain would be 52,5600 minutes, the total size of the blockchain = (52,5600/10)/1000 = 52.56 GB that will be added to the 85GB =137.56GB. it might be added to TBs after a few years for other blockchains that blocks take only 5 seconds to be created. AI can introduce new decentralized learning systems such as federated learning or data sharding techniques to improve the blockchain ecosystem to be more efficient.
  • Security: AI can detect blockchain applications layer intrusion issues. despite that the level of security in the blockchain is high, but the application layers are not so secure especially after a couple of DAO hacks happened. Machine learning models would guarantee the level of security for application deployment and fill the insecure gaps of the whole system structure.
  • Privacy: AI can improve the performance of the hash function. The privacy issues of mining and owning personal data of the population raise regulatory and strategic concerns to find new ways for improving privacy. For example, Homomorphic encryption [29] reduces the computation complexity and increases speed and throughput by performing operations and computations directly on encrypted data without ever needing to decrypt the data that preserve data privacy. Gini and Endor blockchains are using Homomorphic encryptions in their solutions in the economy and automotive predictive analytics.
  • Efficiency: AI can predict the likelihood of a node to fulfill a certain mining task. The current problem is the lack of distribution and organizing the available resources that tend to increase costs. A system might be able to compute and predict the likelihood of a specific node to be the first in performing a certain task such as mining or confirming transactions and giving the opportunity for other nodes to switch off their efforts and cut down the total costs. This will provide not only saving energy consumption but also reduce the network latency and solving data backlog problems by allowing faster transactions.
  • Hardware: AI can enhance the design of mining hardware for overall high performance. A Chinese company “Bitmain” wants to convert the mining nodes to be a neural network for training AI models since AI requires a lot of computations [30]
  • Talent Shortage: AI can form a multi-agent system for generating virtual distributed ledger agents. The short supply of blockchain resources is a current problem. AI will help to automate the physical resources by creating virtual agents that could perform the process of reading and writing transaction data from blocks.
  • Data Gatekeeper: AI can help with open data intelligently. Our data resources will be available and stored on the blockchain, Protocol customers will be able to directly request for buying them from the data owners (us). AI systems will assist and help companies and individuals to grant access, track data usage and make sense of what, how and where our personal data is used.

2. Blockchain for AI (and Data Science)

Fig. 9 Taxonomy of blockchain for AI [31].

Blockchain could be used in many ways to empower Artificial Intelligence applications in a broad range of industries and offer help in tracking, understanding, and explaining decisions made by AI. The following is some points on how the latest AI applications could benefit from using blockchain:

1) Blockchain can help in Explainable AI (XAI) and AI foundation [44], the 1st of the top strategic technology trends in 2018 according to Gartner research [31]. Most of us have a narrow view, visibility, and knowledge on how AI deployed systems make the decisions to produce results that have been used in different fields. Thus, blockchain might be a breakthrough in designing trustworthy and transparent AI algorithms, assisting in troubleshooting and testing to know why algorithms are reaching a specific decision.

2) Blockchain can help in Digital Twins, the 4th of top strategic technology trend in 2018 according to Gartner [31], has been developed by experts in data science and applied mathematics to research the physics underlying systems being imitated and use that data to develop models that simulate the original of real-world in digital space before building and deploying actual devices and solutions [33]. Computer programs take real-world data about physical objects or systems as inputs and then produce output predictions or even simulations of how that physical object or system will be affected by those inputs [34]. Blockchain offers trust, reliability, and provenance for how IoT, AI, and analytics are optimized.

3) Blockchain can help in Automated Machine Learning (AML), a ner trend that will completely automate the machine learning process from data acquisition to knowledge management and revolutionize the way of building and creating models for prediction by removing the need for Data Scientists in order to reduce resources of work and have a faster application development. It starts with TPOT developed by EpistasisLab, an AutoML library in python built on top of Scikit-learn to run analysis on data and assist on which models, features, and hyperparameters are more effective than others [35]. Blockchain offers Immutability, availability, security, privacy, scalability, and permanence in this field especially for a company like DataRobot, a lead player in AML, their business model is to automate Data Science job role models in the market.

4) Blockchain can help in Hybrid learning models and Lean and Augmented Data Learning, by combining different machine learning models to reach better-informed decisions. The hybrid blockchain that combines the benefits of public and private blockchain could be used for example to connect local machine learning models with other outer models to share information, data or consensus for better optimization. It will offer a high value for low data availability applications by enabling transfer learning among different AI applications to ensure the high availability of relevant and accurate data.

5) Blockchain can help Data Scientists and Machine Learning engineers by evaluating and exchanging machine learning models on the Ethereum Blockchain [38]. Create smart contracts on the blockchain that offer a reward for training datasets and return back the results and this will allow users to train models in a trustless blockchain network. This will create a market where parties who are good at solving machine learning problems such as organizations that have problems to solve with AI can create a smart contract for this ticket issue and publish it on the network which will launch a competition to build a better model each time and make AI more accessible to organizations. Deepbrainchain project is a decentralized neural network built on NEO blockchain with hybrid consensus of Delegated Proof of stake DPoS and Proof of Identity PoI, the core of the project is to use blockchain to supply computational power to AI companies in exchange of DBC reward tokens paid by system and AI companies only need to pay 30% in Blockchain can be a backend database for many solutions such as IP-based physical security products challenge run by ONVIF, a leading global standardization initiative to bring new solutions for resolving global security issues, one of the POC projects was cam-X project to combine Deep Learning with IPFS and Ethereum blockchain. A camera with an object detection deep learning model installed on it to stream live videos, capturing images of moving objects and report back unusual traffic to the source by hashing those images and storing them on IPFS and Ethereum blockchain [37]. or Growth from Knowledge (GFK) that built a prototype to create a hashed fingerprint of every relevant analytical component (data, model, and result) of any data-driven project and register each stem of the component linkage as a data property of a transaction in a public blockchain database such as Ethereum or private blockchain such as Hyperledger Fabric.

V. Experience and Results

The user or the owner of the file will use the DApp to upload a file and store it on blockchain and IPFS.

1. The App needs to capture and process the file, CaptureFile{} function to load the file into the console, FileReader() and readAsArrayBuffer() will read and convert this file whenever it is an image, docs, video or JSON file into a buffer of a uint8 and use React constructor() in order to save the state of the buffer by reserving the results in React setState() to be stored on IPFS after using onSubmit() function for submitting the form or the file.

Fig. 10 DApp for SMEs workflow.

2. The app has to connect to IPFS to store the file, either by running IPFS local node on the computer and connect it to DApp or by using Ethereum and IPFS APIs such as INFURA which is a gateway that can act as a blockchain node as a service or IPFS node as a service to make the connection and bridge the node with DApp by using a special package client called ipfs-http-client. Using ipfs.add() function to add the state of the buffer, the file itself to IPFS.

3. The App has to call the hash of the file from IPFS, by declaring a fileHash = result[0]. hash variable to fetch the file on DApp and get the hash from IPFS.

4. The App has to store the file on the backend layer “blockchain” by permanently storing fileHash into a smart contract on the Ethereum blockchain. Migration.sol a build-in smart contract came with a truffle box framework to migrate the contract to blockchain and upgrade its state. File.sol which is the smart contract to store the fileHash of a particular file that has been stored on IPFS into the blockchain. The state variable fileHash will be defined and used in write{fileHash = _fileHash} function to publicly set fileHash and read{(return fileHash)} function to publically get the fileHash from blockchain. 2_deploy_contracts.js code will force migrate.sol — a truffle component to migrate the smart contract File.sol to the blockchain by creating a new variable [const File = artifacts.require(“File)] with the same contract name to deploy the contract on The blockchain.

5. Using chai library inside File.test.js to compare the fileHash of the write and read functions and make sure of the results are the same in the fetching process.

6. Connecting the frontend page to the blockchain using Web3.js Ethereum JavaScript API to connect DApp with Ethereum blockchain to store the file permanently on it. Using a ReactJs lifecycle called componentWillMount to call loadWeb3() function that connects DApp to the blockchain.

7. Fetch the information into the DApps from the blockchain using the ReactJS state object. LoadBlockchainData() function using variable {const accounts = await web3.eth.getAccounts()} to get the ether account of the owner that has uploaded the file. web3.eth() to interact with Ethereum accounts. also using web3.net() to get a network id of the node {const networkID = await web3.eth.net.getId()}.

Fig. 11 Decentralized Application to store files on IPFS and Ethereum blockchain
Fig. 12 Hashed value of Beuth_logo image stored on Ethereum Blockchain.

This is a simple decentralized web page application for storing files. The DApp is connected to MetaMask user wallet with address “0xacE9100807205604c47C9cb96dE2b06315E82224”. The owner of the digital asset will choose a file and submit it to be stored on IPFS and Ethereum blockchain after paying a small amount of gas fees. More specifically, smart contracts will manage to do the work of hashing and migrating the content into Ethereum blockchain.

The following are samples for uploading different file extension formats with different sizes stored on IPFS local and INFURA nodes before fetching the hash and storing it on a private blockchain.

Beuth Logo (.png):

https://ipfs.io/ipfs/Qmc9gRFgCjNHxahHxCCb7MoLQKn5p5vn9LhfC2yAKYsakn

https://ipfs.infura.io/ipfs/Qmc9gRFgCjNHxahHxCCb7MoLQKn5p5vn9LhfC2yAKYsakn

Beuth_txt and JSON file:

https://ipfs.io/ipfs/QmdTvifWmuv41myZSLKRfW75NmagY9AvweMFsMa3tjejsa

https://ipfs.infura.io/ipfs/QmdTvifWmuv41myZSLKRfW75NmagY9AvweMFsMa3tjejsa

EITCO pdf:

https://ipfs.io/ipfs/QmQa1F8TGHygmBivQyRJuUmc4XZ4GMP7mrteMQ72QiBHpj

https://ipfs.infura.io/ipfs/QmQa1F8TGHygmBivQyRJuUmc4XZ4GMP7mrteMQ72QiBHpj

Thinking emoji (.GIF):

https://ipfs.io/ipfs/QmanPHnhJUzCb7RfxiZBc64TG51Nk5nR4PwjKHFSwG6sTf

https://ipfs.infura.io/ipfs/QmanPHnhJUzCb7RfxiZBc64TG51Nk5nR4PwjKHFSwG6sTf

Video (.mp4 and. MOV), Object Detection and Computer Vision:

https://ipfs.io/ipfs/QmP5Y2Qp8PNm4da3UR3VczSXZWYPYoY8CrKBV4Nwz2hgU9

https://ipfs.infura.io/ipfs/QmP5Y2Qp8PNm4da3UR3VczSXZWYPYoY8CrKBV4Nwz2hgU9

https://ipfs.io/ipfs/QmNff6eFLmJGZ7Hriek7SNCr73JsvMXFNnL6DYFvfXMWLx

https://ipfs.infura.io/ipfs/QmNff6eFLmJGZ7Hriek7SNCr73JsvMXFNnL6DYFvfXMWLx.

VI. Conclusion and future of work

In this Thesis, I explained and reviewed the current blockchain technology by building a decentralized application on top of IPFS and Ethereum private blockchain using Geth to store and retrieve data and how the SMEs including data science team could benefit from using blockchain technology and the current state of the art regarding the use of blockchain feature for AI and vice versa.

I gave an overview of how blockchain and decentralized storage work by building a DApp prototype as a POC presented as a detailed comparison of common blockchain implementations and use cases in terms of blockchain types and infrastructure, consensus protocols, decentralized AI operations and how blockchain will revolutionize the new AI, IoT, and automated solutions by listing features and solutions for many industrial problems.

However, smart contracts are the key success of blockchain transactions, implementations since it automates the workflow with code, code has bugs. Thus smart contracts have bugs and many attacks would use zero-days attacks in fully automated AI solutions leading to various catastrophic disasters beyond our understanding as we barely begin to adopt blockchain, understand smart contracts with their engineering, with their implications on the current world.

I can see by comparing with a centralized ecosystem, that blockchain offers better solutions for SMEs and solves many of their problems such as removing the intermediary, better communication protocols, bridge a direct trust with their customers and enhance their business operations, but it’s also important to know that blockchain is not 100% secure (nothing is).

50 million dollars has been stolen from ethereum DAO digital VC More bugs in Parity wallet [39], 30 million dollars [40], and 360 million dollars [41] in Parity wallets due to a bug in a smart contract code that exploited the money. IOTA got hacked many times and recently made a statement to stop using their Trinity wallet after getting hijacked that affects 10 users to lose their money due to stolen seeds [42].

More attacks happened along the way of mass adopting blockchain 2.0 to build DApps, and with the lack of understanding of the complex system behind it, pushed a group of researchers to publish a paper on Ethereum System Security for vulnerabilities and potential attacks.

TABLE 3. The total number of Ethereum vulnerabilities.

From 44 total number of vulnerabilities in Ethereum systems, only 6 have been eliminated, 25 could be avoidable by developing a new approach, and 13 still open with no further solutions provided so far [43].

Despite that further research and studies should be done on merging both technologies and their implication before broader implementation, decentralized AI using blockchain characteristics would offer more secure solutions than centralized AI especially in case of full automated AI where machines are the decision-makers, but efficiency, scalability, and transparency still an open question.

Using blockchain as a backend for autonomous AI solutions will level up the hacking difficulty and reduce the attack rate. Blockchain will solve many SMEs problems such as finding financing, scaling operations, process payments, and many current challenges.

With all these concerns, vulnerabilities and attack scenarios, it will take time to understand blockchain potential and what solutions exactly provide to the market, to know how it works to fill all the gaps(bugs) and prevent malicious attacks. However, nothing will happen without a step forward and taking the risk of migrating to new infrastructure.

Programs such as Blockchers project s as part of the European Horizon 2020 project and Blockstart project launched by the leading European innovation consultancy Bax&Company that will facilitate the blockchain across European SMEs and start-ups by providing business support, identifying and testing business opportunities to build real-world use cases and help increase the mass adoption of blockchain technology among traditional sectors.

Fig. 13 DApp for SMEs built on top of IPFS and Blockchain for Data Science.

In the future, there are several plans on my agenda to improve this research that I have made about Blockchain and Artificial Intelligence. The improvements will be on the infrastructure side to build a Hybrid blockchain and enhance the DApp not only to store and retrieve data from IPFS and Ethereum blockchain but also to build marketplaces for protocol customers such as SMEs, customers, programmers and data scientists

Figure 13. shows a glimpse on how this will work, for example, a cancer researcher working in a hospital and wanted to integrate a model that predicts a patient might develop cancer by looking into XRay photos of that patient, step by step:

1) Protocol customers (eg. SMEs, hospitals..) will create a smart contract with all requirements needed to be fulfilled, and set up a budget and use DApp API to deploy it in a smart contract marketplace.

2) Machine Learning programmers with the help of distributed nodes, will develop algorithms that work on either getting data from patients/people who have access to DApp and they can upload their own data with their approval and post it in dataset marketplace, or use Homomorphic encryption to train the

model on a patient’s local devices by using local computational power without even exposing their actual local data or revealing their identities to the public, and then store the encrypted result/model in ML/DL models marketplace.

3) Data Science Team uses a federated learning approach to have access to the dataset and ML/DL models marketplaces to aggregate data and mini models, averaging them for the sake of building the global model for further predictions, visualization analysis and delivering the final report that meets protocol customers’ requirements.

4) Payment in tokens would be distributed among data owners, Machine learning engineers, and Data Science teams once certain conditions met in that programmable smart contract.

“This work is a part of the PABlo project which is funded by IFAF Berlin”

Don’t forget to give us your 👏 !

--

--

Basem Dabbour
Game of Life

Data Science, Information Technology, Blockchain enthusiast. Instagram: https://bit.ly/3mCTDSI, LinkedIn:https://bit.ly/33FKct6