Blockchain: Rebalancing & Amplifying the Power of AI and Machine Learning (ML)

Published in

JustStable

12 min readJun 13, 2018

Deeptech & Blockchain, Licensed from Adobe Stock

Overview

Artificial intelligence (AI) and machine learning (ML) are powerful technologies that can provide an opportunity to vastly increase the capabilities of computing systems. Rather than being constrained by the parameters of its programming, these technologies can dynamically learn, find patterns, adapt, predict, and evolve.

Most AI and ML techniques make use of artificial neural networks (ANNs), in essence learning systems that mimic the basic functioning of the human brain. To become proficient at specific tasks ANNs require large sets of training data and high computing power. So for example, if an ANN is going to learn to recognize humans, or cats, it requires a large data set of images of humans or cats, and a powerful set of servers to process the information quickly.

The rise of AI and ML has therefore been strongly correlated with the exponential increase in computing power afforded by cloud computing and Big Data that has become available over the last decade. Large US tech players such as Google, Facebook, and Amazon have created large centralized cloud computing data centers, pushed the envelope in AI and ML research, and stored vast quantities of user data. A similar pattern is now evolving in China with tech giants Tencent (games, communication), JD.com + Alibaba (ecommerce), and Baidu (search) entering the AI world. Most of the China tech majors are also involved with autonomous vehicles. Meanwhile, the flood of data has been due to the vast increase in people using connected apps and systems that record user actions and sensory data continuously (IoT).

Since ANNs require large quantities of data for training purposes, tech-savvy players in many industries have allocated significant resources to create powerful data centers and acquire large data sets. Key industries that are leading the charge in these areas include:

· Marketing Personalization/Advertising (i.e. Facebook, Google, Tencent, JD.com, Alibaba)

· Financial Trading (i.e. Wall Street)

· Healthcare (i.e. Google, Apple, Amazon)

· Autonomous Vehicles (i.e. Tesla, Waymo/Google, GM/Cruz, Uber, Baidu, Tencent, Didi)

· Cybersecurity/Defense/Surveillance (i.e. world governments)

As AI and ML have become more powerful, new ANNs have been developed for specific tasks. These new types include: 1) convolutional neural networks (CNNs) for image identification and categorization, 2) recurrent neural networks for natural language processing (NLP), and 3) generative adversarial networks (GANs) for mimicking human creative capabilities (i.e. art, music, writing).

Current Challenges with AI/ML

Those groups actively working with AI and ML in business and government are facing a large host of challenges due to structural issues. The issues recognized by the AI/ML community include:

· Widespread invasion of user privacy

· Siloed and unshareable data

· Biased or manipulated AI programs

· Hackable datasets

· Lack of dataset transparency

A common reason for all of these issues is that data for AI/ML training and usage is centrally stored plus owned and controlled by the group that collects it. These issues were previously analyzed and discussed in my article Blockchain: Securing Trust and Identity.

This centralized structure for both data storage and data ownership is directly responsible for each of the challenges noted above. Specifically:

· Owners of large central data stores have a strong profit motive to invade user privacy and use AI/ML to find patterns and relationships that can be directly monetized (i.e. Facebook)

· Owners of large central data stores are incentivized to increase the size of their datastores (silos) and are not inclined to share data as they see it as a competitive advantage (not shared)

· Data sets are rarely complete and lack of sharing and collaboration means that approaches and training sets have some sort of bias (i.e. design bias, sampling bias)

· Central datastores are a massive attractor for hackers since they provide millions of user records per break-in — they are a single point of failure with a high reward for bad actors

· Datasets and AI/ML algorithms are hidden inside large organizations so it is impossible to understand their structure and logic — they are black boxes

So while those groups active in AI/ML prefer a centralized structure where they own the data, it is clear that a more decentralized, secure and transparent structure where users own their own data is actually advantageous to society as a whole. Since decentralization, transparency, security, and self-sovereign ownership are the hallmarks of blockchain, there is the potential for generating significant benefits from combining the two technology sets together.

Advantages of Blockchain + AI

In the previous section I noted that combining blockchain with AI/ML can provide significant benefits and overcome some of the key challenges facing the AI/ML community. Let’s deconstruct the components and use of AI/ML and see where blockchain can provide the most value.

Data

As noted above, data is critical for training AI/ML algorithms. The current paradigm has given us siloed, unshared datastores that are subject to attack. However, the quality and integrity of the data is also not necessarily optimal. To be useful the data should be accurate, immutable, and complete. However, mechanisms for collecting data can be flawed, the recording of data can be error-prone, and datasets may contain sample-bias due to incompleteness. Consider the manual collection and recording of any type of data — customs information at ports, patient vitals at hospitals, and product information within a supply chain. In all these cases, manual collection and recording of data relies on a small group of unsupervised people.

Recording data on a blockchain, especially when combined with IoT, can provide a much better solution. First, data in a blockchain is not recorded until it goes through a consensus mechanism. The consensus mechanism pulls in a set of incentivized players (public nodes) or stakeholders (permissioned nodes) to ensure that the data being recorded is verified and accurate. Next, the data recorded is time-stamped, cryptographically signed, and immutable. It is therefore by design auditable, transparent, and secure. Adding IoT devices which write directly to the blockchain can, in certain cases (i.e. supply chain, healthcare, logistics, etc.) increase both the frequency and accuracy of the data being recorded.

Centralized data is often incomplete because it captures data compiled by a single entity for only a small set of use cases (consider Facebook, Google, etc.). This approach can lead to both design bias and sample bias in the dataset. Sharing of data between a broader set of entities can create larger and more diverse datasets that span a larger set of use cases. Consider for example that many of the most widely used medical datasets contain information mostly from middle income white males. There are also articles and studies pointing out sample and design bias in many common AI programs from facial recognition (i.e. Microsoft and IBM) to word matching and translation (i.e. Google).

Often centralized datasets are not shared because ‘owners’ of the datasets are worried about security and competitive advantage. Since partial datasets most likely suffer from significant bias, competitive advantage is most likely a misnomer, but this is a perception issue that blockchain cannot solve directly. Regarding security, however, blockchain can provide the ability to make data much more shareable, since blockchain data in permissioned chains and the most widely used public chains is basically unhackable (i.e. Bitcoin, Ethereum).

In a world where users own the data, it will be possible for private and public entities to build highly diverse and more complete datasets that can eliminate bias and form a foundation for truly useful AI/ML. In such a scenario, datasets can be planned and then data purchased from data owners whose profile conforms to the dataset plan. So for example, if a dataset is planned to contain data that is demographically balanced based on a country’s population (i.e. 50% women, 50% men, 20% 18–34, 40% minorities, etc.) it can be planned, purchased, and developed.

AI Program Interaction

Today more than 52% of web traffic is generated by bots. Bots are programs that traverse the Internet and carry out some type of function. An increasing amount of these bots are beginning to use AI, these include voice command bots, chatbots, virtual assistants, and other types of automated helper programs that can learn and evolve.

It is projected that over time, human interaction with the Internet and Web will decrease and humans will delegate many tasks to bots. This is already happening with programs like Alexa, Siri, Cortana, etc. shouldering an increasing amount of web traffic. There are also many bots in the social media ecosystem including Facebook, Telegram, WeChat, Whatsapp, etc.

Just as blockchain will soon be used to create an identity and trust layer for the human Internet (Blockchain: Securing Trust & Identity), the same can be done for the bot-based Internet. This is hugely valuable as more than half of the bots online are malicious rather helpful. If bots can be clearly identified with an immutable identity on the blockchain, this will greatly reduce the ability of malicious bots to wreak havoc, especially if metadata and records can be built up based on bot behavior.

As bot traffic increases, the main paradigm online will switch from human-bot interaction to bot-bot interaction. Simply put, bots will begin interacting more with each other than with humans. Consider a scenario where each human has a virtual assistant (like the recent Google demo) and all of the virtual assistants begin interacting with each other to gather information, schedule appointments, purchase products, etc.

In such a future, bots will need to be able to query each other for identity and then look up history and ratings before interacting. As I’ve noted above, this information needs to be on the blockchain to be secure and create trust. Additionally, sophisticated bots will also need to be able to query each other’s programming algorithms and training data. Querying this information will enable bots to better ‘understand’ each other then ‘decide’ if they want to create a working relationship. A bot for example could choose to only work with other bots that use advanced algorithms and unbiased or complete training data.

Today, AI programs are black box systems because all the data and programming are hidden from view. In a future dominated by bot-bot interactions, a higher level of transparency will be required and the blockchain is the right technology for storing data that will enable bots to more easily audit each other. The blockchain can also be used to incentivize auditing interactions, for example by rewarding greater transparency with tokens.

Better Control of Human-Bot and Bot-Bot Interactions

In a world dominated by human-bot and bot-bot interactions, it may be very easy for interactions to spin out of control due to the black box nature of most AI programs. Some examples of troubling AI behavior include: Microsoft’s chatbot Tay becoming racist (humans gaming bots), Facebook bot-bot interactions creating sub-languages (bots running wild), etc. To instill order and more control over bot-bot interactions, or even human-bot interactions, it may be prudent to use another core blockchain technology: smart contracts.

Smart contracts specify the rules of interaction, put limits on the outcomes possible, and do not execute unless certain parameters have been met. Additionally, smart contracts can be checked for errors via mathematical formal verification to ensure that they are error and bug-free. This extra logic layer can provide a parameterization and structure on otherwise free-form machine learning algorithms to reduce the potential for unexpected or problematic results.

Clearly, smart contracts would not be used in all cases, because creativity can lead to serendipitous new solutions and breakthroughs. However, they could be used in many cases where parties would want to constrain the outcome in a well-defined way. For example, in business contracts, legal contracts, securities agreements, etc.

It is my suggestion that smart contracts should be an area that humans write and control. Because they can act as a check on AI, it is prudent and potentially beneficial that the control for bot-bot interactions is human-mediated and not machine-mediated.

Use Cases

Some key use cases where the power of AI/ML + blockchain become obvious can be seen in the following areas:

Healthcare: Personalized Medicine

As has been laid out by my colleague Radhika Iyengar-Emens in a previous article, blockchain combined with self-sovereign identity and AI/ML can liberate medical data, unlocking silos and creating new capabilities. For example, each doctor could use an automated medical assistant (AI/ML medical bot) to query a patient’s personal health record (PHR), current vitals/symptoms, and relevant population health data to quickly diagnose a patient’s condition and recommend optimal treatment regimes. The doctor would then use the bot’s suggestions to create a personalized treatment plan optimized for the patient.

Such an approach is impossible today: few people have a PHR due to security concerns, population health data is siloed and full of selection bias, and medical bots are primitive and untrained due to lack of quality data.

Transportation: Autonomous Vehicles on Demand

Much has been written about the self-driving future. In many scenarios self-driving cars will roam the streets picking up passengers on demand. However, there are many important challenges that must be overcome before this is possible, including: secure identity, secure authentication, protection from vehicle hacking, true autonomous vehicles (level 5), and large bias-free datasets. These challenges can be overcome by combining blockchain technology (immutability, ultra-security, decentralization, and smart contracts), self-sovereign identity, unbiased Big Data, and AI/ML. Incorporating blockchain technology can ensure that cars are not hijacked remotely, the correct passengers are picked up and charged for rides, identities are not stolen, human preferences are prioritized, and networks become more efficient over time.

Each car and each person will likely have a bot assistant which will interact for scheduling, payment, routing, and car preferences. These systems will interact via smart contracts and learn from each other to better customize future rides based on system feedback (i.e. efficiency, cost/time, vehicle usage) and human feedback (i.e. likes, dislikes, preferences). Biometrics (facial recognition, fingerprints) in combination with decentralized car and human identity (private key encryption) will minimize the success of hacking by removing single points of failure. Shared, unbiased big data sets will be required for obstacle categorization and avoidance plus culturally and gender-appropriate human interaction.

Logistics and Supply Chains: Fully Automated Systems

In supply chains and logistics systems today there are many different actors which include suppliers, transporters (i.e. truck, train, plane), warehouses, customs, insurers, consolidators, buyers, etc. Each of these players has a small piece of information about a shipment (i.e. first step/last step, and/or previous step/next step, and/or arrival/departure time, or product condition, etc.). As a result, significant problems can ensue if errors are made or bad actors become involved in a process.

Key challenges for logistics and supply chains stem from a lack of process transparency, inefficient communication between players, and the inclusion of players that may not be trustworthy. Problems include missed hand-offs, bad actors that steal products or money, lost records or paperwork, fraudulent or counterfeit products, weather delays, etc.

A combination of blockchain, IoT, plus AI/ML can ensure that all players maintain a real-time record of an entire transaction, can communicate effectively, and the tasks of each player can be accurately recorded and optimized over time. Key blockchain platforms for the supply chain and logistics industries have developed RFID-based IoT devices that can scan products and record actions directly to the blockchain. Such a capability gives an immediate and immutable track and trace capability that identifies the root cause of any errors or problems as well as the players that perform to agreement.

Adding AI/ML bots, robots, and self-driving vehicles could theoretically fully automate a supply chain or logistics process, with all transactions being steered by smart contracts. The AI/ML bots could analyze transactions both real-time and historically to correct prior errors and improve efficiency. Environmental factors, such as temperature and humidity, could also be monitored via IoT devices when transporting perishable items like food or ‘sensitive’ items like expensive artwork. The net result would be much lower product and shipping costs, fewer errors, fewer bad actors, and new more profitable business models.

Conclusion

AI and ML technologies have significant potential to add tremendous value to human society. However, as I’ve noted, the current AI/ML paradigm that is based on centrally controlled systems, biased datasets, and unenfranchised users is fatally flawed and cases of abuse are rampant. Blockchain technologies, including ultra-secure and immutable ledgers, strong, consensus mechanisms, decentralization, and self-sovereign identity have tremendous potential to rebalance and improve AI/ML algorithms.

Additionally, as the Internet moves away from human-human interactions and towards being mostly bot-bot interactions, we may be able to ward off a dystopian future by employing blockchain-based smart contracts to logically predefine and constrain outcomes.

In summary, blockchain + AI/ML + Big Data is a much stronger combination than any of these technologies alone, and the power of the combination will likely be revolutionary for many fields.

Acknowledgements: Thanks to Radhika Iyengar-Emens, my colleague at DoubleNova Group, for her contributions.