Decentralized ML: Part II (Blockchain Basics)

Vivek Khimani
Coinmonks
Published in
8 min readAug 18, 2020

--

The second part of the decentralized machine learning series introduces the basic concepts of blockchain, a tool used for decentralization.

Photo by Launchpresso on Unsplash

This is the second part of the series of blogs on the topic of Decentralized Machine Learning. Throughout the series, I will introduce the field of decentralized machine learning, explain the need for decentralization, cover some tools and techniques used to achieve decentralization, and end the series by providing a detailed account of the current research status and open problems in the field. I haven’t currently decided on the number of parts I plan to provide in the series but stay tuned for more information.

Join my Quora space, for regular news and updates about decentralized machine learning. Being extremely experienced in building and researching the decentralized machine learning systems, I will be posting original content about the latest progress in academia and the industry. Visit my personal website for more details about my education, skills, experiences, and credentials.

Thinking about decentralization in general, blockchain is undoubtedly a technology that first comes into our mind. However, before digging deeper into the basic details and architecture about the blockchain, it’s important to clear a few myths that people usually carry about the blockchain.

Photo by Wendelin Jacober from Pexels

The most important one being the distinction between Bitcoin and blockchain. Although cryptocurrency has emerged to be one of the major applications of blockchain in recent years, it’s important to understand that it doesn’t generalize the overall concept of blockchain. Moreover, a lot of people like to think about blockchain as a database that cannot be edited. Although blockchain is an immutable structure with the capabilities for storing data, it can never be compared to the modern databases with relational features, querying, and the ease of scalability. Undoubtedly, such concepts are good starting points to gain an intuition about the working and applications of blockchain, but it’s important to understand the broader idea to apply blockchain to broader principles like decentralized machine learning.

At the most basic level, blockchain, as the name suggests, is a data structure that can be represented using a chain of blocks. However, the significance of the blockchain arises from the fact that each of the blocks has an ability to store data and they are connected (and ordered) together using a chain-like structure.

Let me describe one of the analogies that I frequently like to use to describe the basic idea of blockchain. Visualize a blank, hardbound notebook that is used in an office to record the attendance of the employees. As a rule of thumb, each employee has to enter his name, check-in time, and signature while entering the office every morning. The book initially has nothing written in it, but all the employees are allowed to make an entry in the book. Also, data has to be entered on the same page. The employees can start using the new page only when the previous page is completely filled. However, there are a few conditions:

  • There is no authorized (or centralized) person who is keeping a track of the records entered in the book
  • Once an employee enters the data, he is not allowed to scratch it and modify the values
  • Instead of a central authority, everyone present in the office when an employee makes an entry will verify if the check-in time entered is correct. The entry can only be validated if at least 67% person of the people present in the office approves the check-in time
  • As the book is hardbound, tearing any page from the book will destroy the entire book and the system will collapse

As you would notice, maintaining such an attendance protocol in the office would completely eliminate the need for a centralized authority to track attendance. In the presence of an individual (or a team) centralized authority, a lot of employees can compromise the system by convincing that authority to lie in his favor. But it will be extremely hard for someone to convince 67% of the employees to lie in his favor. As a result, such a system is called a decentralized system, which can be independently run by the participants in the absence of a centralized authority to supervise the system. As we noticed, even in the absence of a centralized authority, it’s extremely hard (and almost impossible in the case of production-level blockchain) to breach the system.

Blockchain Structure

Now, let’s see how this naively designed attendance recording system can be applied to the concept of blockchain. In a blockchain, the blocks are similar to pages and participants are similar to employees who constitute a network. Similar to the office setting, the participants are allowed to store data on the most recent blocks (pages) and no new block is added in the chain until the recent block is completely filled. Once the block is filled with the data, it’s added to the chain and a subset of participants called miners is responsible for verifying the data. Again, the real blockchain often has a strong consensus protocol which helps it to uphold the truthfulness and pay the miners for their contribution. As the data on the blockchain is stored on sequentially-attached blocks, it’s impossible to delete or modify any block as it will break the chain.

Photo by Moose Photos from Pexels

As mentioned before, the blockchain architecture is currently being extensively used for developing cryptocurrencies. In that case, all the blocks store the transaction data and miners are responsible for verifying the authenticity of the transaction. However, Ethereum blockchain also allows the users to store custom data on the blockchain and build fully-decentralized applications. They are also providing a special Turing complete language called Solidity, which can be used to develop methods that can be called by other participants in the network. Solidity allows users to store custom data (string, integers, mapping, etc.) and execute basic operations (arithmetic, loops, etc.) on the blockchain. However, it’s worth noting that the functionality provided by Solidity is extremely limited compared to modern programming languages as the participants need to pay data for storage and operations on the blockchain. For example, if a participant calls a smart contract method deployed on the blockchain which iterates through the array of strings, performs some operation, and stores the result on a blockchain, the callee would be paying for the costs unless otherwise specified on the smart contract. The smart contract also allows executing transactions on the blockchain. Which means I can pay or receive money from any participant with a valid Ethereum address. Despite a lot of limitations and scalability issues with the current smart contract programming environment, it’s possible to build extremely creative decentralized apps and there are also extensive APIs and tools available to connect smart contracts from the external environment (web3 being the most popular one). For example, a very popular beginner project is building an e-commerce platform on blockchain to ensure transparency between buyers and sellers. We will be using a similar set of blockchain development tools to build and deployed decentralized machine learning apps. As it’s extremely difficult to cover all the tools and concepts in a single post, I will be sharing resources towards the end and will also explain different components when I discuss some interesting research findings in the upcoming post.

In a real blockchain, there are a lot of intrinsic details like consensus protocols, compensation for the miners, block addresses, chaining, etc. which I won’t be able to cover in this post. Also, there are various loopholes (like who will verify the entry of the first employee, practical implications and scalability of the verification approach, etc.) in the naively designed attendance protocol that we discussed above. Also, I am aware that really advanced automated systems are already available to track the attendance and check-in times of the employees. But, the example is sufficient to get an overall idea of the concept. However, I will provide relevant links and resources towards the end which will help you to expand on the knowledge we gained from the post.

Photo by Pixabay from Pexels

Finally, as I per my promise in the previous post, here’s a bunch of online resources (blogs, videos, online courses, etc.) which will help you dig deeper into the world of blockchain:

Understanding the basics of blockchain, blockchain development, and smart contract programming, in general, will really be helpful in our journey of exploring decentralized machine learning. Despite the limitations, we will notice how the researchers frequently use blockchain for achieving decentralization in complex machine learning systems.

In the few upcoming posts, I plan to introduce the tools and techniques that can be used to achieve decentralization in a machine learning setting. In addition to the introduction, I will be providing blogs and resources toward the end of the article which will help you to expand on the information provided. Look out for Decentralized ML: Part III (Ethereum). Thanks for reading!

Again, if you wish to get regular news and updates about decentralized machine learning, please follow my space on Quora. Based on my experience as a researcher, I will be sharing original content relevant to industrial and academic progress in the field of decentralized machine learning.

Also, Read

--

--

Vivek Khimani
Coinmonks

Software engineer at r2c (Semgrep), a security startup. Building TorchFL, a Python library for bootstrapping federated learning experiments. Ex-Meta (Facebook).