“Mars Public Talk” №133 | Allan Zhang: Using Blockchain to Drive Development of Artificial Intelligence
On July 28th, Mars Finance has published a public talk with DxChain Co-founder Allan Zhang, discussing using blockchain to drive the development of artificial intelligence. Please refer down below for this interview.
The blockchain is playing a critical role in the explosion of high-quality data, which is an opportunity for small and medium-sized projects and companies.
Artificial intelligence and blockchain are undoubtedly the hottest technologies today. Artificial intelligence represents exceptional productivity, and blockchain means a new production relation.
So, what will happen when artificial intelligence meets blockchain?
In a public talk on July 28, DxChain Founder Allan Zhang gave a talk on using blockchain to drive the development of artificial intelligence.
He believes that blockchain today lacks a genuinely high-performing public chain for providing storage and computing. Zhang and his team founded DxChain, aiming to build the next-generation technology public chain centered around blockchain storage and computing, and provide on-demand computing and storage services based on the blockchain technology, thereby reducing costs. DxChain wants to create a decentralized parallel-computing environment that supports big data and machine learning to accelerate the development of artificial intelligence.
Read on for more from Zhang’s full speech, edited by Mars Finance for clarity and brevity. (WeChat: hxcj24h):
First: The relationship between blockchain and artificial intelligence
Let’s first take a look at the relationship between artificial intelligence and blockchain.
The development of artificial intelligence cannot exist without data, computing power and storage. Regardless of how advanced the algorithm is, AI needs the support of underlying infrastructures.
Interestingly, AI and blockchain are somehow antagonistic, as Peter Thiel and Reid Hoffman have expressed in a recent interview.
AI is very centralized, and is only in the hands of a few companies, mainly Google, Apple, Facebook and Amazon (“GAFA”) and China’s Internet giants Alibaba, Tencent and Baidu (“BAT”).
The main reason is that the data is in their database. For startups or projects, the centralization of AI opens the door for all kinds of abuse. We don’t go into details about Facebook.
The emergence of blockchains can address the data monopoly issue and offers technical possibilities.
The idea is probably like this: we are all financially motivated to provide data. We know more about sharing sensitive data (expenses, health information) when we know that this data can be secure and secure (through decentralization and secure computing). Compared with those controlled by GAFA, as time goes by, the market will pile up more data as well as the quality of the data.
Small and medium-sized artificial intelligence vendors also have the opportunity to obtain high-quality data at a lower cost. This is the best way to break the data monopoly in the traditional Internet age. The blockchain will bring about the explosion of high-quality data, which is an opportunity for small and medium-sized companies.
In addition to data, artificial intelligence requires computing power and storage, which is a high bar for small and medium-sized companies with limited financial resources.
Blockchain can bring decentralized computing power and on-demand storage services, which can reduce costs by motivating miners to contribute computing power and storage space. This allows small and medium-sized artificial intelligence companies to build their own big data and machine learning platforms and applications at lower costs.
DxChain uses blockchain technologies to build a data transaction market, enabling companies with high-quality data to obtain data at a lower cost and more efficiency Users don’t have to worry about their privacy leaks. You can also get money return to achieve a win-win situation.
At the same time, DxChain also strives to make computing power and sharing storage space reality with blockchain technology, thereby reducing the costs of computing power and storage and providing a decentralized parallel-computing environment that supports big data and machine learning to accelerate the development of artificial intelligence.
Second: Explain the DxChain architecture
Next, I will explain in detail how DxChain combines blockchain and AI.
We provide decentralized big data analysis and machine learning computations. It wasn’t easy. To achieve this goal, we have made some innovations in the infrastructure to solve the current storage and computing bottlenecks.
DxChain believes only one chain cannot meet the demand for storage, computation, and privacy. So it adds data side chain and computation side chain in a way akin to Lightning Network. The master chain runs smart contracts and manages the data chain and computing chain while the two side chains each perform their duties and are responsible for storage and computing.
DxChain Network uses the account-based model to store the transaction and asset information, which includes account states, transactions across accounts and receipts. The master chain uses an Ethereum-compatible data structure which is composed of hash- linked blocks. The data is stored in all the nodes of the network.
The data side chain is built on a P2P distributed file storage system and stores the non-assets information. It is designed to complete computational tasks, which are based on real business demands. The computing unit can read data from the data side chain and also write the result back to the data side chain.
The data side chain is built on the P2P distributed file storage system and stores non-asset class information. The calculation of the side chain is mainly to complete the calculation task, and the calculation task is a task based on the real business needs. The calculation unit can read the data of the data side chain and write the result to the data side chain.
After one task is completed, the final state will be stored on the master chain via a smart contract. The intermediate states and task-level transaction information are kept in side-chains. The data chain and the computing chain can interoperate with each other through the chains-on-chain micro-services, which includes the data and messages. They communicate with the master chain through the smart contracts of DxChain Network.
We call this architecture the “chains-on-chain“ model. This design helps achieve its goal of providing computing and storage services for big data and machine learning, unlike Bitcoin’s primary purpose for financial transactions. Side chains are designed to enable efficiency, scalability while meeting specific business scenario needs.
The DxChain master chain keeps a low cost, and side-chains enable efficient computing and data storage. Different chains communicate through smart contracts, ultimately forming a holistic service architecture. Both the data side chain and the computing side chain have their own consensus algorithms.
Also, writing the active transaction of the side chain to the master chain can support the asset transfer between two chains. The side chains and the master chain use the same token, and the side chains can also have their own tokens.
Third: Decentralized computing services
Regarding computing power, many recent developments in AI have been facilitated by the significant boost in computing power, which is the result of better utilization of existing hardware and the development of new high-performance hardware specifically for AI (For example, Google’s TPU).
DxChain provides computing services for big data and machine learning. It solves this problem with decentralization. One is to share the unused resources, and the other is to use it efficiently based on specific tasks.
DxChain’s provided computing power is different from the one of Bitcoin. It is not only used for network security, but also for real business needs. Instead of giving a digital currency, DxChain delivers a decentralized computing environment.
To verify the correctness of the computations, DxChain proposed two consensus algorithms: verification game and Provable Data Computation.
Verification game provides a framework to validate the correctness of the computation procedure, and Provable Data Computation (PDC) provides a statistical scheme to find a corrected answer from a set of untrusted nodes with a small probability of being attacked.
Verification game algorithm is designed as a system with three main types of roles: solver, verifier, and judge. This interactive system can testify to the correctness of a computation procedure without wasting too much computation power.
In a verification game, the core roles are Solver, Challenger, and Judges. A Solver is a miner who offers a solution to a given task, and a Challenger is one who disagrees with the answer from the Solver. The Judges, who always provide the correct computations, use minimal computation bandwidth.
Verification game does not trust or rely on the reputation of its participants or any trusted party in the system. A deposit is needed to perform a task from both the Solver and the Challenger. For any faulty players, they will lose the deposit. This penalty mechanism will potentially eliminate the untrusted players with the passing of time.
In Provable Data Computation, a computational task could be broadcasted through the network. X nodes perform the job; the answer which was the first identical one that Y nodes generated is chosen as the valid answer.
DxChain also integrates Hadoop to achieve decentralized computing. The Hadoop core elements are job tracker, task tracker, and worker built on MapReduce. Akin to Hadoop, DxChain Network has two designated roles: D-Job Tracker and D-Task Tracker, to perform two different tasks. The miner is intended to receive the incentive if the miner is honest in executing the job that it promises; otherwise, the miner will lose the deposit.
MapReduce is a centralized design system, in which the job tracker manages cluster resources and job scheduling. The task tracker in each agent manages tasks in the nodes as well as communicates with the job tracker.
DxChain Network is a decentralized system which triggers the difficulty of keeping real-time communications between two nodes in a distributed network. In DxChain Network, there is no need to check the states of the task nodes. More copies of redundant computations running in different nodes, as well as whether one or few nodes are off-line or dead, will not have an impact on the final result. The Hadoop system knows the activeness of the node through its state of activity. If some of the nodes running active tasks are dead, the job tracker must reassign tasks to new nodes.
When a node completes a computation, the job tracker will send the result to the computing side chain through verification game or Provable Data Computation. The computing side chain saves the work assignment information and results.
Fourth: Decentralized data storage services
We believe that for the purpose of AI training, you need to create your own data. DxChain’s economic model can encourage users to upload data and solve the problem of where AI data comes from.
Providing a decentralized computing environment, DxChain is a decentralized storage network where files are stored for computation results and all kinds of intermediate computation states.
The data side chain is built on a P2P distributed storage network, such as IPFS, Swarm, and so on. The chain works as an incentive layer, which does not need to store data. Data and files are divided into small pieces and kept in the p2p network.
Meanwhile, the meta information and hash for each piece are stored in the chain, known as the file state, which similarly uses the Merkle Patricia Tree structure. DxChain has also designed a cross-chain URI for the file itself so that the data can be easily accessible across the network and chains.
Between the data chain and P2P storage network, DxChain also has a virtual logical layer, which includes a storage task giver, miners for importing and exporting files and a verifier.
Since DxChain uses a decentralized approach, miners who provide data storage need a consensus mechanism to drive incentives and secure the network.
Proof of Spacetime (PoSt) is used as the consensus method for the data storage chain to validate the provision of storage. The data side chain manages storage tasks and will also be connected to the master chain for giving storage miners incentives as well as the computing side chain for storing computation states. Its advantages include faster setting times, lower transaction fees, more rapid transaction speed, higher privacy and the ability for transparency.
Proof of Spacetime is well-suited for the decentralized network because it improves on Provable Data Possession, which allows a client that has stored data on an untrusted server to verify that the server saved the original data without retrieving it. Provable Data Possession provides a solution that a client must keep sending challenges to a server to verify if the server store some files in continuous time.
Proof of Spacetime, on the other hand, can prevent Sybil attacks with algorithms and ensure the system is complete and secure. It can always produce valid proofs, convince a verifier, and avoid any adversarial assaults if any honest prover stores a file.
The Proof of Spacetime consensus can also be publicly verifiable to protect privacy and prevent other malicious behaviors. It can also enable a prover to prove a statement to a verifier without revealing anything about the statement in a zero-knowledge proof protocol.
Fifth: Privacy protection
To make the decentralized AI market really work out, you need to ensure that any data provided by individuals and companies is handled in a completely private manner. So it comes to the privacy issue.
The industry has adopted a variety of practices to strengthen data privacy protection on blockchains, such as homomorphic encryption, multi-party computation, etc., which use encryption to protect privacy. SGX is another widely-adopted technique encrypted by hardware.
DxChain uses a more practical solution — encrypting critical data information for privacy protection with a robust data processing capability.
Since we can do fine-grained operations on data, data have structures when they are stored in chains. For example, in a data spreadsheet, there is one column for persons’ names. We encrypt that critical information, but disclose other information instead of encrypting the entire file. This is called data model-based data encryption.
Besides, DxChain also uses differential privacy. Differential privacy methods mitigate the probability of one user skewing query results and allowing information to be traced back to that user. If the users want to provide data only for statistical analysis, such as calculating mean and standard deviation, DxChain Network has a tool to facilitate the users in running differential privacy before submitting files to the network.
Another method is Miner Storage Encryption: The data piece is encrypted by using a storage miner public key in each local machine. Doing this protects against intrusion from network hackers as they do not know the private key of the miner.
A large file will be split into many small pieces using different strategies, as only gaining access to a small fragment of a big file would not disclose much information.
Let’s talk about the market prospects of combining AI and blockchain.
A decentralized market may be a novel approach to creating AI, but anything that comes out of it still needs to solve real problems to achieve commercial success. From that perspective, I want to speak of some vertical industries (industrial, genomic, financial, etc.).
DxChain is infrastructure, so its implementation is especially critical for SMEs and startups. Many SMEs are in shortage of high-quality data, so they have to purchase high-quality data from other large companies. Many high-quality samples owned by these companies are never available to the market
By using the data model of DxChain Network to standardize data, data exchange and sharing are enabled by each vendor opening their APIs to others through, which is very appealing to companies thirsty for high-quality data, especially for artificial intelligence companies.
DxChain is also a data trading platform. The user can define which data to trade, as well as the price of the transaction. This will benefit both the data consumer and the supplier.
Not just that, the maintenance cost of computing and storage is also high. By building decentralized big data and machine learning networks, DxChain allows artificial intelligence vendors to reduce costs and develop their own machine learning platforms and applications. Their data is stored on the storage miners’ disks, and miners share bandwidth to minimize data storage and network traffic costs.
For example, in the healthcare industry, smart devices can provide users with remote diagnosis and benefit more people. However, many low-income people even cannot afford it. Medical intelligent devices are fragmented and hard to be integrated. User data can be misused merely.
As a decentralized big data and machine learning network, DxChain network could potentially enable the ecosystem developer to leverage on it to build its own big data and machine learning platform.
With DxChain Network, the cost for storing data and managing traffic will be dramatically reduced, which will potentially help to lower the healthcare premium.
The data collected through fitness trackers, mobile apps, smartwatches and other devices linked to the network are encrypted and stored on the blockchain in a compliant and secure way. Users can also exchange their data for economic profits.
Finally, based on data collected by medical devices, healthcare providers can establish their own artificial intelligence technology to monitor patient health and send critical vital signals to the community.
In general, with DxChain, different industries can build their own big data and machine learning platforms and applications, and can significantly reduce the development cost of artificial intelligence. DxChain can help developers in different industries to obtain more high-quality data at a lower price as well as the computing power and storage, ramping up the development of artificial intelligence.
Seventh: Discussion on the future of the industry
The combination of AI and decentralization brings a lot of possibilities.
Fred Ehrsam wrote in a recent blog that blockchain can provide an interesting organizational model to help various AI robots collaborate in a transparent manner.
Traveling is a good example: You can have a robot buy a flight ticket. If there is a delay, another robot can predict the possibility of misconnection and propose another route, while the first robot can change the reservation. All of these can be done automatically in the backend in real time, altogether eliminating the friction we humans might have.
The SingularityNET project is another intriguing example — it’s an ambitious, complex project, made up of many parts. To demonstrate how various AI can work together to create a single brain, they developed Sophia, a Hanlon robot driven by SingularityNET. The video demo reminds people of the TV series “Western World.”
Fred Ehrsam pointed out in this blog post that the future AI operating in a completely autonomous manner is possible, which is precisely the idea of decentralized autonomous organization DAO — a completely machine-enabled decentralized organization with limited human intervention. For example, imagine an utterly decentralized version of Uber where AI controls autonomous vehicles. With a giant feedback loop inside, the Uber system will earn how to dispatch vehicles, efficiently transport people, handle various logistics tasks, and combine a variety of skills and complexities into a self-operating business.
However, this sort of AIDAO is a bit scary. If there is an organization genuinely decentralized and autonomous, we are unclear how it can suspend services when encountering a disorder event. This is not as simple as pulling the plug.
Now the Q&A session:
Q1: How about the other projects?
A1: Morpheo is a project that uses blockchain to analyze medical data. The project is not purely decentralized but uses a trusted cloud platform to store data and perform computing. The blockchain is used in this project as an incentive mechanism.
The machine learning algorithms in this platform are open sourced, but the uploaded data is private. Individuals can upload data and run each algorithm on data once. They can, therefore, determine the performance of different algorithms.
Since data storage has not yet been solved by the blockchain technology, the hybrid system is used by many blockchain projects and is not just an AI project.
Besides, many platforms for deep learning and trusted hardware have been integrated with blockchains.
Q2: Can deep learning run on blockchains now?
A2: Deep learning has become very popular in recent years. Many blockchain projects want to touch upon this technique. There are many open source frameworks for deep learning, such as Caffe, MXNet, and so on. The core of deep learning is to accelerate matrix operations through GPUs. The current mainstream blockchain project is to split the matrix operations into small tasks and then assign them to different compute nodes.
The problem with this approach is that the specially-designed blockchain system is only suitable for a specific type of deep learning, and has poor scalability. Also, deep learning better works on processing images and videos.
Q3: Is it better to combine trusted hardware and blockchain?
A3: The combination of trusted hardware and blockchain, such as TEE, has attracted a lot of attention, and some projects do so now. This type of project is actually more suitable for real privacy protection. I have tried simple machine learning algorithms using SGX, such as the K-means algorithm. When the data is less than 400M, it takes about 5 minutes. Once the data exceeds a specific capacity, such as 1.2G, it needs to run for 2 days. At present, SGX has strict requirements on memory, so the performance is not particularly good. The time to run SGX in a single machine is uncontrollable. There are not many programs for SGX, even in large companies.
In a decentralized environment, performance is limited by the limitations of the network.
Q4: Is the combination of AI and blockchain a fake demand?
A4: Both AI and blockchain are the buzzwords. You cannot just simply put them together. It will be a fake requirement. I have seen a lot of unrealistic projects. Simply using a blockchain to perform AI is a fictitious demand. For example, only users who can afford high-end GPUs can use the graphics cards to run for eight hours. We don’t take much time on parameter tuning. I believe things like model training should not be allowed on the blockchain.
The most significant demand for AI is to use standard algorithms to explore the value of data. So applying a model to real-world data could be a practical use. You need data to train models, and so the data collection is the most critical part. DxChain is designed to perform a set of analysis: data collection, data cleaning, data analysis, and reporting.
Q5: What are the real-life applications of AI and blockchain, and when to see them?
A5: The application of AI in blockchain requires a lot of storage and computing, so it must run applications where data is intensive, and the amount of computation is reasonable. And traditional centralized applications are not working in such a scenario.
I believe AI and blockchain will have a considerable influence in the medical field and the IoT industry. DxChain is currently working in this direction and hopes that other projects can solve the problem of this system platform together.
DxChain Founder and CEO, Trustlook CEO Allan Zhang is a serial entrepreneur with extensive management experience. He has over 10 years of experience in cybersecurity, 8 years of experience in digital currencies, and 5 years of experiences in blockchain investments. He was the initial engineer of Palo Alto network and developed advanced research in Lucent and nCircle.
QuarkChain CMO Yazhen Xiang is responsible for marketing, operation, and public relation businesses in QuarkChain. He previously worked for Argus, LinkedIn, and Wish. He obtained his master degree at Johns Hopkins and received his bachelor degrees in Shanghai Jiaotong University.
BlockVC Founder Kevin Xu is a QuarkChain consultant, He previously worked in large financial technology companies such as Credit Suisse, Singular Silver, Dingming Finance. He has participated in ETF product design and industrial fund management, with a fund of over RMB 1 billion. Today, Xu’s blockchain asset management has exceeded 200 million US dollars. His portfolio companies include QuarkChain, NKN, Celer Network, IoTex, DxChain, VeChain, Republic Protocol, Certik, etc.
Xu received his Master of Computer Statistics and Machine Learning at the University of London College (UCL).
With regards to DxChain: A Decentralized Big Data and Machine Learning Network Powered by a Computing-Centric Blockchain.