Supply chain use case: Merkle Tree “off-chain”

Louis Singer
ON-X Blockchain (Chain-Accelerator)
12 min readApr 10, 2020

Representing goods for blockchain-powered supply chains is a complicated challenge because the model must respect:

  • privacy: In many cases, it is dangerous (sometimes illegal too) to share data about the transfer of goods.
  • decentralization: no one should be in possession of the data, at the risk of rendering the blockchain useless.

In this article, we propose a model answering these problems.

Represent assets as Merkle trees

Asset model

Supply chains are about the manipulation of assets. We need to define and represent assets.

An asset is a good described by a type (a set of attributes like color, shape, serial number etc…) and can be a container for other assets.

Representation of an asset

An example of assets could be a box of medicines:

Box of medicines using asset model

These boxes of medicines can be transported on pallets. And these pallets bundled together in a batch. Thus, we’ll get a complex composition of assets:

A complex assets composition

What we call the supply chain is actually a sequence of asset transactions. As an asset can be a composition of assets, we need a method to represent them as linked values while ensuring immutability and integrity. All this without loss of performance. Fortunately for us, cryptography has already solved this problem.

Hash + tree data structure = Merkle tree

We’ll use Merkle trees to represents our assets. A Merkle Tree is an awesome cryptographic tool used to ensure complex data structure integrity. A Merkle tree is (often but not necessarily) a binary tree where :

  • The value of leaves is the hash of the piece of data which initialized them.
  • The value of a node is the hash of the concatenation of the values of these children.
source: https://github.com/miguelmota/merkletreejs/blob/master/diagrams/merkle-tree.png

Thus, if someone modifies one of the pieces of data using to calculate leaves, the Merkle root will change. So, I can verify the integrity of a large number of data only with a unique hash (the root of the tree).

Merkle trees are actually a representation of the shape of a dataset. With a (private) dataset, I can calculate its shape and check if it corresponds to the (public) merkle tree. In this way, I do not need to store the dataset on a centralized server to ensure its integrity.

Asset as a Merkle tree

We saw above that an asset is represented by his type and his (optional) content. In terms of a Merkle tree, we can represent an asset like the following:

Asset as a Merkle tree: level 0

However, the assets composing an asset can also be represented as a tree. A repetitive structure appears!

fAsset as a Merkle tree: level 100

We can define the asset Merkle root as A Merkle tree node where:

  • The left child is the Merkle leaf built with the asset type.
  • The right child is the Merkle root of the nodes of the asset’s content.

The assets contained in the “root” asset may have different types. The medicine box below contains pills for the day and pills for the night.

An asset containing two types of assets

We will modify our tree model to sort content by type. For each type, we will calculate the Merkle root of the assets of that type. Then the right child node will be computed with the list of root nodes computed for each type. Let’s see an example, the tree of a box containing 2 pills for nights and 3 pills for days:

Asset as a Merkle tree: level 1000

Let’s formalize all this with an algorithm.

algorithm getRootNode is
input:
asset A with type A.type and content A.assets (sort by type).
function h: a hash function.
merkleRoot: a function list_nodes --> merkle root node.
output: R, the root node of the asset A.
leafNodeType = Node(A.type)
if A.assets is null then
return
leafNodeType
else
rootsOfAssets = list()
for each a in A.assets do
rootsOfAssets.push(merkleRoot(a.content))
assetsNode = merkleRoot(rootsOfAssets)
return merkleRoot(list(leafNodeType, assetsNode))

Implement Asset Merkle tree with Javascript

The first thing to do is to implement our Merkle tree. All our design is based on the Merkle tree node. We define a node as an object described by three attributes:

  1. data: the node data. In the case of Merkle trees, the data of a node is either hash(data_child_left + data_child_right) or hash(data_child_left) if it is a leaf.
  2. left: the left child node if it exists.
  3. right: the right child node if it exists.

A leaf is a node with only one child (the left one in our case). So a node is considered as a leaf when his right node is null.

The Merkle tree implementation architecture will be node-centered. It means that a tree is represented by his root node. Thus, we just need to design a Node object and then write a set of functions manipulating the Nodes object.

Let’s code!

We decided to use javascript , but you can re-write the code with any programming languages. What’s important is the logic behind the code. If you want to do it in javascript here is the list of commands to configure the environment (Assuming you have installed NodeJS on your computer):

# create the project folder and move inside
mkdir merkle-tree-supply-chain
cd merkle-tree-supply-chain
# init as a node project with npm
npm init
# install babel as dev dependencies
npm i --save-dev @babel/core @babel/node @babel/preset-env
# create babelrc file with the preset-env
echo "{
\"presets\": [
\"@babel/preset-env\"
]
}" >> .babelrc
# create the src directory
mkdir src
# create the entrypoint file
touch src/index.js

Finally, edit the package.json file using your favorite text editor or IDE. Add the start script to the scripts object:

"scripts": {
"test": "echo \"Error: no test specified\" && exit 1",
"start": "npx babel-node src/index.js"
}

Then we can use npm run start to start our application. Now open the merkle-tree-supply-chain with your favorite IDE (mine is vs code). We can start coding!

The first thing to do is to create a hash function. I write it inside a specific file: src/hash.utils.js .

the hash function will be the function using to calculate all our hashes values. We used SHA256 from crypto-js but you can use different ones. The prefix is using to blind the data . Why? Because we handle easy-to-guess data like a number of pills. Without a prefix, it will be easy to guess the value of a number of pills hash (we can test all values and compare quickly).

Once we have our hash function, let’s implement our class node. I put the files specific to the Merkle tree in a MerkleTree directory. Let’s create the Node.class.js file inside it:

The constructor is just about testing the parameters. if we provide only left(as a String!) then we’ll create a leaf: that’s why I stored the hash the given data in leafData . If left and right are Nodes objects then we’ll create a regular node and assign left and right attributes. Note that we use a custom isString function imported from ./utils :

We also create two getters:

  • data : returns the leafData if the node is a leaf or hash(this.left.data + this.right.data) .
  • isLeaf : returns true if the node is a leaf false in other cases.

Later we’ll need to display our linked nodes, so I add two functions used to print the tree of a given node:

So the entire class should be:

We have a functional representation of a tree’s node. We must now write the function that builds the Merkle tree for us. I want to give it a list of nodes and the function should return me the Merkle root node (and so generates all the nodes between initial nodes and the Merkle root).

In my file src/utils , I create a new getMerkleRootNode function:

It’s a recursive function. Let’s explain why:

MerkleRoot(A, B, C) = MerkleRoot(hash(A + B), C) = hash(hash(A+B)+C)

In the example above, the Merkle root of the leaves [A, B, C] is also the Merkle root of the next layer [hash(A + B), C]. So we use a recursive function to move to the next tree layer if the root calculation is not possible directly. It lets to create all the intermediate nodes from the leaves to the root.

A recursive function is defined by its stop and continue cases. Here we stop function calls when:

  • nodes is an empty array: throw an error.
  • nodes is an array with 1 element: we return the given node (the Merkle root of 1 leaf = the given leaf).
  • nodes is an array containing 2 elements: we return a new node which is the parent of the two nodes in the array.

In other cases, we’ll continue the recursive execution. We need to compute the new layer and then call the function with the new nodes array. And so on until we get to a stop case. Small subtlety: in the case of an odd array of nodes, the last one is removed from the list. This node will not participate in the computation of the new nodes (the tree is binary!). It is an arbitrary choice that affects the outcome.

Here is a schema representing a call of the function getMerkleRootNode([A, B, C]) where A, B, and C are nodes instances.

The code below does the same thing:

Result: our awesome tree with three layers. Your hashes may be different than mine (we probably don’t use the same prefix).

.1170bda35de764c0713f31c1aa8256bcd223441f7755f3c838688ed7c7a81f2d
┗a1e1b72789a07cc25b0cd3b98e6796b25eda1404b48a4f9c6d35964f63f3e966
┗d167e45dad7d508b1095274c288b48ba241370546b0245d9689afb2ba9058f0e
┗55850f93f613afcc1a268c5d3d3b085c158be78c550cead28af697ed99c79a1c
┗dce9a5ad2bd86dd728b0da1e703eecdf7cf349adc31ac57111a630306277bfdf

Now we can handle Merkle trees. Let’s implement a new one, which will allow us to manipulate Assets. Create a new file named asset.class.js . As we saw before, an asset has two attributes: a type defining what the asset is and a set of assets = the asset’s content.

Let’s take a look at the whole class code:

Most of the functions are simple:

  • The constructor gets the asset type as an object. Note that we only store the type’s hash: we do not need the raw data. An asset can be empty so the initialAssets parameter is optional. The constructor sorts the asset’s content by type’s hash using the type property.
  • The function add properly add a new Asset to the instanced one content.
  • The getter isItem checks if the content asset is null. The constructor initialized it to null if the initialAssets parameter is an empty array.

And finally the getter node that returns the Merkle root node of the tree representing the asset.

This is the implementation of the algorithm described above. We use our very useful function getMerkleRootNode.

If the asset does not has content then we just return a leaf: return new Node(this.type). Else, we create a tree for each type of assets composing the content. Then take the created Merkle roots and create another tree: the assetsRootNode . And finally, we use this node and a leaf created from the type of asset to return the root of our instance: return getMerkleRootNode([new Node(this.type), assetsRootNode]) .

Let’s test it:

I get the following result:

As you can see, I found my box inside my batch 😊

Go further

At this stage, we are able to represent the assets’ composition. However, there are other parameters to take into account in order to manage our supply chain as well as possible:

  • A timestamp will define the creation order of the assets. We need to know when the assets have been created.
  • A signature will define who created the asset. When someone creates an asset (i.e a part of the tree), it signs it.

Signature concept (and more) explained in this article.

First of all, let’s add the three following functions based on the crypto module of Node. They’ll let us implement signature system:

Another thing to do is to modify our Node constructor. We do not want to hash the signature (if we do it, we’ll not be able to verify the signature later).

We only add an optional parameter: IsSignature . If it equals true we don’t digest the leaf’s data with hash function. Now we can create a Node in three different ways:

  • new Node(leftNode, rightNode) creates a node with two children.
  • new Node(data) creates a leaf and hash the data.
  • new Node(signature, null, true) create a leaf and do not hash the signature.

Then let’s modify our Asset constructor to implement timestamp and signature.

Now the constructor takes two new arguments:

  • The timestamp which defines when the asset is created.
  • A private key used to sign the asset.

We’ll create a new node called caracteristicsNode which is the Merkle root of all the leaves defining the Asset. In our case, the Assets has two characteristics: its type and its creationTimestamp . The characteristic node will be:

Depending on the use case context, others parameters (i.e leaves) can be added to the Caracteristics node.

Then we sign the characteristics node data to get the asset signature.

Finally, we created a new Asset’s member: its defnitionNode . It is a node where the left child is the characteristics node and its right child is the signature leaf.

This node will replace the type leaf used as the left child of the asset node. So we’ll replace all the new Node(this.type) by this.definitionNode.

The new Asset class looks like the following:

Here is a diagram showing an Asset’s node with and without a signature.

Then, if we test all of these with a simple example:

And here what we get as a result for the box:

We also see that the signature is verified:

Full source code: https://github.com/louisinger/merkle-tree-supply-chain

Why does it meet the blockchain-powered supply chain needs?

Let’s take a simple scenario:

In this situation, all the assets can be represented by the Merkle tree of the batch. It is a huge tree. You can print in on your console by cloning the repository and launch a custom npm script:

# the source code will be cloned on the working directory
git clone https://github.com/louisinger/merkle-tree-supply-chain.git
cd merkle-tree-supply-chain
# create the .env file
echo "HASH_PREFIX=randomgeneratedprefix" > .env
npm install
npm run start-demo

With a Merkle Tree representation, the sending of 10 pills = 1 box can be represented by the node of the box. If we take my Merkle tree, for instance, box 2 is represented by the following hash: 28f1f0b922421443a8b482e0c9d10d1ddac410bead3d07c3d98c3a55eda173ec.

Thus, the exchange of the whole box does not require 10 transactions (one for each pill) but only one. Because the tree provides the link between the root hash of the box 2 and the hash of the pills.

In addition, here we have taken the example of a three-level asset composition (batch -> box -> pill). However, in supply chain contexts, the assets handled and exchanged are often much more complex.

The tree-shaped asset lets stakeholders handle only the layer that they work with.

This article is a prerequisite for the following: we will see how to implement a complete supply chain process with Elements. Still in the use case of drug exchange. The Merkle tree representation of Assets will allow us to gain in cost and performance without losing the immutable and decentralized character of the blockchain.

The next step is to store the Merkle roots inside a blockchain. For this, we will see how bitcoin scripts work. We’ll return roots as transaction output with the opcode OP_RETURN .

See you!

--

--