Blockchain, Hash and Merkle-tree: data immutability and integrity, append-only database

5 min readFeb 13, 2022

Hash chaining and merkle hash tree play important role in many applications to provide data immutability and system integrity protection. This article will illustrate some real life cases to see how these hashing schemes are in action:
1. Git source control system.
2. Blockchain like Hyperledger Fabric.
3. Dm-verity to measure Linux rootfs integrity as part of chain of trust.
4. Certificate transparency
5. TPM(Trusted Platform Module) PCR(platform Configuration Registers) for platform integrity measurement.

Let’s start from the tool we use on daily basis. Git source control tool actually chains all git history(commits) with each patch’s hash and use commit id(patch’s hashing) to index each patch as shown in Figure 1. Clearly each commit meta-data points to its parent commit by commit id(patch hashing).

Figure 1: How git commits are chained together

Git uses each commit’s hash digests in similar fashion as blockchain(will cover below). First, the hash serves as integrity checksum when certain commit is retrieved and entire patch’s hash will be computed to make sure they are matching. Secondly, the commit id(patch hashing) is used as database key for looking up actual patch.

Something to note for git system: It does provide history integrity but not immutability as Blockchain ledger.

DM-Verity used in linux system for system image integrity measurement

Linux rootfs can be authenticated during boot/run-time as part of chain of trust using dm-verity, which provides block devices’ integrity check by using kernel Crypto API.

Static Merkle hash tree is prebuilt upon readonly part of rootfs block by block. Then hash will be computed and verified against merkle hash tree during each run-time access for performance optimization.

**Figure** 2: Linux DM-Verity merkle hash tree on rootfs blocks

Following diagram illustrates how DM-verity works as part of Linux system secure boot flow. Note that merkle tree root hash is passed to kernel via authenticated u-boot’s boot-args.

Figure 3: Embedded linux secure boot flow with DM-Verity (Source)

Certificate transparency(CT) makes all CA issued certificates publicly accessible/auditable by adding them to append-only cryptographically verifiable log server. Anyone can then query this log server to audit if subject TLS certificate was legitimately appended.

**Figure 4:** Certificate Transparency auditing in TLS server authN flow

Figure 4. illustrates high-level steps how 1. Domain certificate was issued from CA and appended to CT log server. 2. Client browser performs TLS handshake with Domain server and query CT log server to audit server certificate as part of server authN process.

**Figure 5:** Use Merkel hash tree to provide cryptographically assurance of CT logs consistency (Source)

CA submits every issued/authentic certificate to log server, where CT will append the certificate into a dynamic Merkel tree as displayed in Figure 5. Whenever end user’s queries come in, CT can use this Merkel tree to provide audit proof that subject certificate has been included in the CT logs.

Note: Depending on log server’s threat model, CT log Merkel tree’s root hash need to be authenticated first before it can be used to provide audit proof similar to the way how DM-Verity receives its Merkel root hash from u-boot’s boot-args. In practice, CT log root hash can be PKC signed or write into certain blockchain immutable ledger as following HLF.

Blockchain like Hyperledger Fabric(HLF) as immutable append-only ledger

Figure 6. outlines how read/write transaction proposal is requested from HLF client application and is processed by peer node by invoking smart contract before proposal response is signed and sent back to HLF client application. In case of HLF write transaction, HLF client application need to forward the signed proposal response to ordering service for validating before queuing for updating into the ledger(including blockchain and state database). At the same time, gossip protocol will be used to propagate latest ledger state to other peer nodes on HLF P2P network.

**Figure 6**: Hyperledger Fabric high level system diagram (Source)

Important property of blockchain like Hyperledger Fabric(HLF) is its resistance to change due to two fundamental technical guarantee:

The ledger is append-only and immutable, guaranteed by cryptography.
The ledger (including immutable blockchain and state DB) is decentralized and replicated to each HLP peer nodes. No centralized server/DB maintains the ledger.

Figure 7. briefly illustrates how HLF blockchain is chained together via cryptographic hashing, conceptually similar to how git system chains together its commit history. Each block is linked to both all preceding and all successor blocks by maintaining previous block’s hash in Block Header, attacker has to change entire succeeding chain to tamper any single block.

**Figure** 7: Hyperledger Fabric block chaining (Source)

Other than previous block’s secure hash, each block header also contains timestamp, nonce and merkle tree root hash. Merkle tree is used to represent efficiently all transactions forming current block.

TPM’s PCR bank(platform Configuration Registers) for platform ROT integrity measurement.

**Figure** 8: What are inside TPM chip (Source)

TPM, as part of computing system hardware ROT, plays important roles in security applications such as following:

Direct Anonymous Attestation using AIK(attestation identity key) to enable remote authN of a trusted computer.

2. Secure key store for generating, storing and access control of key usage.

3. Track platform integrity by taking and storing trusted system integrity measurement via TCG’s SRTM or DRTM(dynamic root of trust measurement).

PCR registers on TPM are a set of extend-only registers(cannot be set or forged). To store a new value in a PCR, the existing value is extended with a new value as follows: PCR[N] = HASHalg( PCR[N] || ArgumentOfExtend ).

The hash value in these PCR[N] registers can only be seen via TPM interfaces seal()/unseal()/quote() operations.

How could it be used for SRTM and also binding a secret key to particular integrity measuring status?

SRTM(static root of trust measurement) happens at secure booting to establish chain of trust to full system up-running. Platform secure booting implementation will measure each component at different booting stage and send resulting HASH value to TPM (eg. from boot-rom -> off-chip bootloader-> secure OS -> UEFI uboot -> linux kernel -> rootfs -> user space).

These updated PCR[N] integrity hash can be compared to golden provisioned value. In case of matching, TPM could reveal a secret key for cryptographic operation eg. FDE(full disk encryption)’ s decrypting AES key, which basically makes sure only SRTM validated trusted system can access secret decrypting key to decrypt file system for proper system operation.

Happy reading so far and please feel free to share your thoughts/leaving your comments.

Blockchain, Hash and Merkle-tree: data immutability and integrity, append-only database

Written by lei zhou