# Merkle Trees in Elixir

A hash tree or **Merkle tree** is a tree in which every non-leaf node is labelled with the hash of the labels or values (in case of leaves) of its child nodes. Hash trees are useful because they allow efficient and secure verification of the contents of large data structures.

Hash trees are used in the IPFS file system, BitTorrent protocol, Git, Bitcoin, Ethereum, and a number of NoSQL systems like Apache Cassandra and Riak.

They are used for many kinds of verification, especially of large chunks of data.

### Building a Merkle Tree

Say we’re building a file sharing system such as Bittorent, which involves sending blocks of data over the network. We want to verify the integrity and authenticity of our blocks. Blocks should arrive uncorrupted, and not altered by a malicious third party along the way.

*How do we use the Merkle tree to efficiently do this?*

The above diagram shows a binary Merkle tree. Each `Node`

in a binary tree has two children.

`defmodule MerkleTree.Node do`

@moduledoc """

This module implements a tree node abstraction.

"""

defstruct [:value, :children]

@type t :: %MerkleTree.Node{

value: String.t,

children: [MerkleTree.Node.t],

}

end

We have some blocks of data `L1...L4`

.

A Merkle tree only contains hashes. To build our tree, we first create leaf nodes by hashing our data blocks with a `hash_function`

such as `SHA-2`

. This gives us 4 leaf nodes with values `hash(L1)...hash(L4)`

respectively.

`data_blocks = ["a", "b", "c", "d"]`

hash_function = &MerkleTree.Crypto.sha256/1

leaves = Enum.map(data_blocks, fn(block) ->

%MerkleTree.Node{

value: hash_function.(block),

children: [],

}

end)

Next, we need to recursively build parent nodes whose values are equal to the hashes of its immediate children’s values. At each step, the number of nodes we have to work with will halve until only a single root node remains.

We calculate the hash of parent node by hashing the concatenation of its children’s values: `hash(left_child.value <> right_child.value)`

.

Below is an excerpt of the tree building code:

`@doc """`

Builds a new binary merkle tree.

"""

@spec new(blocks, hash_function) :: root

def build(blocks, hash_function) do

leaves = Enum.map(blocks, fn(block) ->

%MerkleTree.Node{

value: hash_function.(block),

children: [],

}

end)

build_tree(leaves, hash_function)

end

defp build_tree([root], _), do: root # Base case

defp build_tree(nodes, hash_function) do # Recursive case

children_partitions = Enum.chunk(nodes, @number_of_children)

parents = Enum.map(children_partitions, fn(partition) ->

concatenated_values = partition

|> Enum.map(&(&1.value))

|> Enum.reduce("", fn(x, acc) -> acc <> x end)

%MerkleTree.Node{

value: hash_function.(concatenated_values),

children: partition

}

end)

build_tree(parents, hash_function)

end

In the above code, we have a public `build`

method that creates the initial `leaves`

leaf nodes from our `blocks`

, and passes it into the `build_tree`

private function. The `build_tree`

function recursively builds parent nodes until only a single `root`

node is left, at which point `root`

is returned.

Study the recursion in `build_tree`

carefully.

Note that the implementation above is more general as it can accept trees with any arity, not just binary. Instead of doing

left.value <> right.value, we use

Enum.reduceto concatenate any number of

children.

### Merkle Tree Verification

Looking at the above tree, we can see that only the root hash is necessary to verify that all the data blocks `L1..L4`

are valid. Consequently, the root hash can be sent through a trusted source.

We can then use this root hash to verify the integrity and authenticity of blocks. Once the root hash is available, the Merkle hash tree itself can be received from any non-trusted sources such as a peer in a P2P network. The received tree is then compared against the root hash. If the match fails, we move on to other peers and continue checking until we find a valid tree.

To verify if any *individual* data blocks are corrupted during transport or have been tampered with, we only need hashes on the `log(n)`

path of nodes from a corrupted leaf up to the root will differ. In other words, we only need to ensure that the hashes of adjacent siblings/uncles are the same along the way up the tree. If any don’t match, either the leaf node is invalid or a sibling node along the `log(n)`

path is invalid.

### Application of Merkle proofs in Bitcoin

In Bitcoin, every transaction has a hash associated with it.

In a block, all of the transaction hashes in the block are themselves hashed (sometimes several times — the exact process is complex), and the result is the Merkle root. In other words, the Merkle root is the hash of all the hashes of all the transactions in the block. The Merkle root is included in the block header. With this scheme, it is possible to securely verify that a transaction has been accepted by the network (and get the number of confirmations) by downloading just the tiny block headers and Merkle tree — downloading the entire block chain is unnecessary.

Merkle trees allows for you to verify transactions as needed and not include the body of every transaction in the block header, while still providing a way to verify the entire blockchain (and therefore proof of work) on every transaction.

### Application of Merkle proofs in IPFS

The InterPlanetary File System (IPFS) uses `merkledag`

- a more generalized form of a Merkle tree. You can read more about it here.

### Appendix: Elixir library

I’ve written a `merkle_tree`

implementation in pure Elixir, making it easy for you to start using Merkle proofs in your applications.

Add `merkle_tree`

to your list of dependencies in `mix.exs`

:

`def deps do`

[{:merkle_tree, "~> 1.1.0"}]

end

You can create trees like so:

`iex> f = MerkleTree.new ['a', 'b', 'c', 'd']`

%MerkleTree{blocks: ['a', 'b', 'c', 'd'], hash_function: &MerkleTree.Crypto.sha256/1,

root: %MerkleTree.Node{children: [%MerkleTree.Node{children: [%MerkleTree.Node{children: [],

value: "ca978112ca1bbdcafac231b39a23dc4da786eff8147c4e72b9807785afee48bb"},

%MerkleTree.Node{children: [], value: "3e23e8160039594a33894f6564e1b1348bbd7a0088d42c4acb73eeaed59c009d"}],

value: "62af5c3cb8da3e4f25061e829ebeea5c7513c54949115b1acc225930a90154da"},

%MerkleTree.Node{children: [%MerkleTree.Node{children: [], value: "2e7d2c03a9507ae265ecf5b5356885a53393a2029d241394997265a1a25aefc6"},

%MerkleTree.Node{children: [], value: "18ac3e7343f016890c510e93f935261169d9e3f565436429830faf0934f4f8e4"}],

value: "d3a0f1c792ccf7f1708d5422696263e35755a86917ea76ef9242bd4a8cf4891a"}],

value: "58c89d709329eb37285837b042ab6ff72c7c8f74de0446b091b6a0131c102cfd"}}

iex> f.root.value

58c89d709329eb37285837b042ab6ff72c7c8f74de0446b091b6a0131c102cfd

`iex> f = MerkleTree.new ['a', 'b', 'c', 'd'] %MerkleTree{blocks: ['a', 'b', 'c', 'd'], hash_function: &MerkleTree.Crypto.sha256/1, root: %MerkleTree.Node{children: [%MerkleTree.Node{children: [%MerkleTree.Node{children: [], value: "ca978112ca1bbdcafac231b39a23dc4da786eff8147c4e72b9807785afee48bb"}, %MerkleTree.Node{children: [], value: "3e23e8160039594a33894f6564e1b1348bbd7a0088d42c4acb73eeaed59c009d"}], value: "62af5c3cb8da3e4f25061e829ebeea5c7513c54949115b1acc225930a90154da"}, %MerkleTree.Node{children: [%MerkleTree.Node{children: [], value: "2e7d2c03a9507ae265ecf5b5356885a53393a2029d241394997265a1a25aefc6"}, %MerkleTree.Node{children: [], value: "18ac3e7343f016890c510e93f935261169d9e3f565436429830faf0934f4f8e4"}], value: "d3a0f1c792ccf7f1708d5422696263e35755a86917ea76ef9242bd4a8cf4891a"}], value: "58c89d709329eb37285837b042ab6ff72c7c8f74de0446b091b6a0131c102cfd"}} iex> f.root.value 58c89d709329eb37285837b042ab6ff72c7c8f74de0446b091b6a0131c102cfd`

### Appendix: Cryptographic hashing

The `merkle_tree`

pure Elixir implementation uses `SHA-2`

as the default `hash_function`

used and calls Erlang’s `:crypto`

module for its hash implementations.

`defmodule MerkleTree.Crypto do`

@moduledoc """

This module defines some cryptographic hash functions used to hash block

contents.

"""

@type algorithm :: :md5 | :sha | :sha224 | :sha256 | :sha384 | :sha512

@spec sha256(String.t) :: String.t

def sha256(data) do

hash(data, :sha256)

end

@spec hash(String.t, algorithm) :: String.t

def hash(data, algorithm) do

:crypto.hash(algorithm, data) |> Base.encode16(case: :lower)

end

end

### Conclusion

The Merkle hash tree is an incredible useful data structure used in a wide range of modern applications, such as peer-to-peer file sharing, cryptocurrencies, modern file systems, and more. They are used in many kinds of verification, especially of large chunks of data.

*Originally published at **yos.io** on May 19, 2016.*