Substrate in a nutshell

Dmitriy Kashitsyn
Parity Technologies
10 min readOct 4, 2018

Today we’ll be talking about Substrate, a library that can help you build your own custom blockchain. Substrate is created by Parity Technologies and provides the basis for Polkadot.

A substance or layer that underlies something, or on which some process occurs (Oxford Dictionary).

As the name suggests, Substrate may be used to grow or build something on it. In our case, we can build blockchains, or, in the case of Polkadot, a whole family of blockchains.

Software is all about abstractions.

The history of computer science shows that we’re constantly getting more and more abstract: from discrete logic to integrated circuits and CPUs; from machine code to assembly, from assembly to C, then to C++, Rust, Haskell and so on.

The same happens with programming APIs. For example, these days almost no one writes for web in plain HTML.

“Every problem may be solved by introducing yet another layer of abstraction. Except for a problem of too many abstraction layers… for that we invent frameworks.” — rephrasing of Andrew Koenig’s FTSE.

Each abstraction level tries to solve a particular problem. So what problem was Substrate created to solve? It turns out that, before you start implementing your brand new blockchain solution, you have a lot of topics to think about:

  • Why on earth would one need yet another blockchain?!
  • Various crypto primitives: encryption, signatures, RNG¹, etc.
  • Consensus algorithms and fault tolerance voting.
  • Proof of Waste, Proof of Stake, Proof of Authority?
    Or maybe something entirely different?
  • Block structure and efficient storage, message serialization.
  • P2P networking, peer discovery, block and transaction gossip.
  • State machine, execution runtime, smart contracts.
  • Light client support

Although Substrate does not address the first item, Substrate may help you to deal with the rest by providing existing implementations that were designed, written, and tested with great care, based on our years of experience implementing blockchains.

Sure, you can implement all of those tasks by hand, but then you’d probably end up with an ad-hoc, poorly tested, and not-very-documented solution, to say the least². Not to mention that usually it’s considered a very bad idea to design or implement cryptographic algorithms yourself, unless you’re a cryptography expert and you really know what you’re doing.

So, by providing generic implementations of typical algorithms, Substrate allows you to concentrate on the essence of your project: the business logic of the chain, i.e., its state machine.

Let’s walk through the most important parts of a blockchain and see what Substrate may offer.

Blockchain as persistent storage

The sole purpose of any blockchain is to provide a way to store and mutate data in a verifiable and globally persistent way, meaning that all parties have a zero-trust way to check and agree on what values should be considered actual in any point in time. Moreover, once such data is sealed, it should be persistent, and depending on the consensus, impossible to modify it.

This property is widely used in cryptocurrencies where persistent storage contains account keys and their actual balances. However, it should be noted that cryptocurrency is not the only possible blockchain application. Basically, almost every system that requires globally coherent, persistent storage and verifiable transaction history may be implemented using a blockchain in one way or another.

Substrate provides efficient storage that is very easy to use and that is tightly integrated with the WebAssembly (Wasm) runtime.

Blockchain as a function

In order to update the chain state and alter its storage according to pending operations, we need to have a point where decisions are made.

Such decision points may be expressed as a function that takes current state and a set of pending operations and yields another state that should be considered the new actual state of the chain. In the blockchain world, such a function is called the state transition function, or STF for short.

Substrate lets you define such a function in a very manageable and portable way. Much like JavaScript that’s executed on a web page, you may write a set of functions collectively called runtime, that would act as a STF. Additionally, such an implementation would be portable and would not depend on processor architecture, operating system, browser, or be in any other way platform-dependent.

In fact, even the underlying technologies of Substrate are closely related. Substrate uses WebAssembly as the lingua franca of its runtime, the very same technology that is now being integrated by major companies like Mozilla, Google, and Apple as a faster but still compatible alternative to JavaScript for the web.

Safety and speed

Having your chain logic and smart contracts written in Wasm means you will have the best tools out there to execute your logic in a fast and reliable way. But Substrate has a way to execute your code even faster — and without any virtual machine overhead.

One of the most revolutionary parts of Substrate is that the runtime image containing the STF is stored, among other payload, right on the chain. That means that runtime and the whole chain’s business logic may be updated in a secure and verifiable way. Even more, since both Substrate and its Runtime Module Library³ are written in Rust programming language, they can be translated to both native code and Wasm.

At any point in time, client software has two copies of the compiled runtime: one that’s compiled in the software natively, and one that is a Wasm image to be executed in a VM. When executing runtime functions, the client software checks whether the on-chain Wasm version of the runtime matches its compiled-to-native, built-in version. When it does, the client software then delegates execution of the runtime functions to the native code version.

Forkless upgrades

When the runtime image gets updated on the chain, some clients will not have updated their software yet. In that case, their node would execute the correct version of the runtime by interpreting it on Substrate’s integrated Wasm virtual machine. So, in any case, all nodes on the network are always able to synchronise the chain correctly (albeit at different levels of efficiency), thus preventing a chain fork.

Networking

Blockchain depends on having lots of participants communicating over a network. The typical solution is to use Peer-to-Peer technologies for such communication, and Substrate is not an exception here. P2P is a common name for a set of technologies used to create decentralized networking applications.

The main idea is to have a self-sustaining network environment where every participant (usually called a node) is capable of operating within a network without any prior configuration or interaction with an authority.

To ensure that nodes can join or leave the network at any time without affecting overall network connectivity, Substrate uses the Rust implementation of libp2p, a promising network stack that has everything needed to set up a decentralized network environment.

Custom messages

In the simplest case, you don’t even need to think about networking, because Substrate does it all for you. You just provide the state transition function of your blockchain and leave all network interaction to Substrate. However, if your blockchain requires custom messages to be sent, you can customize and extend the network subsystem by providing specialization of the network protocol that defines custom messages and their handling logic.

Consensus

Having a state transition function that allows you to move from one state to another is good, but not enough. You also need a way for all nodes to agree on what the next state should be.

As an owner of a bank account, the last thing you want is to have a situation wherein you and your bank disagree on the amount of money you have in your account. Blockchain allows parties to reach consensus without trusting each other (hence zero-trust), even in presence of malicious participants who are actively trying to break the system and steal your money.

This is done using a consensus algorithm with a property called Byzantine Fault Tolerance (BFT). If a system is BFT, it means that the nodes can reach consensus even if some fraction of them behave arbitrarily badly, including collusion, withholding messages, and being offline. BFT consensus systems achieve resistance against varying degrees of networking issues, where messages can be reordered or delayed. Some BFT consensus systems are designed such that when nodes misbehave (e.g. voting for two blocks at a time), they can be punished and have stake slashed on-chain.

For every consensus engine supported in Substrate, there will be a runtime module designed for handling proofs of misbehavior. The repercussions of evaluating a proof of misbehavior can be determined by the runtime.

It was mathematically proven that this protocol is viable as long as two thirds of the nodes in the network are not malicious and operate according to the protocol. This is one of the reasons why it is important to have a lot of nodes in the network.

Consensus is a critical part of any blockchain application. Luckily, Substrate provides an existing implementation of BFT consensus that can be used almost out of the box.

Depending on a usage scenario, you may either use existing block authoring logic, or provide your own. In the latter case, you may use the generic implementation of the BFT and adapt it to your needs.

Substrate will continue to evolve and have more consensus algorithms added to its repertoire including the GHOST-based Recursive Ancestor Deriving Prefix Agreement (GRANDPA) finality gadget developed at the Web3 Foundation.

Also, since Substrate is a fully extendable and customizable framework, it is possible to define your own custom consensus algorithm. In fact, Substrate is so flexible that it may support solutions that are not based on classic blockchain architecture. For example, we are researching how to address the blockchain throughput problem by designing a consensus that is not based on traditional block concept.

Light client support

Early blockchain implementations were designed in such a way that every node in the network maintained the complete blockchain database locally. This is now referred to as a full client, meaning that the client has everything it need to operate as a network node.

Full clients, also known as full nodes, are important for chain security. But as the blockchain grows, the client’s database becomes larger and larger. Currently, mainstream cryptocurrencies have databases of several hundreds of gigabytes.

When the full node is initialized, the first thing it needs to do is to synchronize with the rest of the network. Due to security reasons, such a node could not just download the database from a random node “as is.” Instead, it needs to build its own database from scratch by replaying all transactions since genesis (the very first block of the chain). In addition to being computationally expensive, this task requires huge amount of data to be transferred across the network.

I think everyone would agree that downloading half of a terabyte of data to your mobile phone only to buy a sandwich is… impractical if not ridiculous. That’s why, almost right from the start, blockchain developers started to think about a way to reduce the costs of maintaining the node, both in storage space and in network throughput.

In the end, the light client concept was born. In a nutshell, a light client is an operating mode of a blockchain node where only the most important data is stored locally and network interaction is reduced to a bare minimum while retaining an acceptable amount of security for almost all interactions anyone is likely to make.

Modest resource requirements finally allowed light client nodes to bootstrap easily and therefore be executed on mobile devices.

Unfortunately, it is a serious undertaking to integrate light client support into an existing blockchain. It is much easier to integrate light clients in a blockchain’s initial design. Substrate was specifically designed with light client support in mind. Blockchains that are built based on Substrate have light client support right out of the box.

Conclusion

In this post we have skimmed through the key goals and features of Substrate framework. To cover all aspects of Substrate would take a much longer article, but hopefully you now have a general understanding of the concepts. While this is not enough to start writing your code, at least now you should know what to expect.

For more information, check out the earlier “What is Substrate?” post by Jack Fransham. If you’d like to dig in and start implementing your blockchain, the best place to start is the Parity Substrate Wiki. Also, don’t hesitate looking at the source code, especially the ReadMe file.

Footnotes

  1. RNG stands for Random Number Generator. Not every RNG is suitable for blockchain applications.
  2. Substrate itself is now in the very active phase of development. The code base is very volatile and documentation is a work in progress. We are actively populating the wiki, which is a great place to look for more details.
  3. Runtime module library is an optional set of Rust crates that deals with common tasks, like parameter serialization and call dispatch, and helps you build your runtime at minimal cost. This library is completely optional, so it’s perfectly fine to design your own runtime from scratch, or use any language that may be compiled to Wasm. Aside from Rust, currently only C and C++ support Wasm as a target architecture.

Originally published at www.parity.io on October 4, 2018.

--

--