Photo by John Salzarulo on Unsplash

Introducing Precise-Proofs: Create & Validate Field-Level Merkle Proofs

Merkle trees are not just at the core of every blockchain but they can also be used to share a subset of a document with third parties to prove authenticity of the whole data structure and the subset while keeping the rest of the document private. Precise-proofs is our Go library for a standardized way to create Merkle trees & proofs for field level data of complex data structures. In the article below, we are outlining how we are building this standard and some of the thoughts behind the technical decisions.

Vitalik’s exaggerating a bit in the quote above, but Merkle trees are at the very core of every blockchain project out there. They allow one to validate that any element in a large set of values is part of one Merkle root hash. In Bitcoin and Ethereum, Merkle trees are created for each block to verify that a specific transaction is part of a mined block. OpenTimestamps uses Merkle trees to stamp a number of documents in a single bitcoin transaction. Centrifuge OS allows its users to collaborate on business documents and uses Merkle trees and proofs for data validation. As an example, when suppliers receive purchase orders or send invoices to their customers the documents are hashed and these hashes (the Merkle root hashes) are committed to Ethereum to ensure the data validity of these exchanges.

If this is the first time you hear about Merkle trees, I recommend reading the article below.

The immediate business partners of a specific transaction are not the only ones wanting to share data with each other on Centrifuge OS. There are customs brokers, insurance companies, financing providers, and auditors that might need access to a subset of fields but not necessarily the entire document. Anybody interacting with a document has the need to verify the authenticity of their subset of the document and check against the published Merkle root hash.

Ordinarily, Merkle trees are created with the leaves representing objects such as transaction IDs, binary data chunks, or document hashes. This finer-grained hashing is needed, because with just a hash of the entire document, a subset of fields can’t be verified as originating from a bigger document.

Hashing not just entire documents but creating a Merkle tree out of individual fields gives precise control over which fields one wishes to share and create proofs for. Sharing just parts of a document with a third party is as simple as creating Merkle proofs for the fields to be shared. Precise-proofs is our approach to building these Merkle trees out of field level data to address this problem.

For illustration purposes let’s use the example of a simple invoice that omits nested structures:

{
"Amount": "$1500",
"InvoiceDate": "2018-03-01",
"DueDate": "2018_07-01",
"Country": "USA",
"Supplier": "Arbor Tree Inc",
"Buyer": " ACME Paper Inc",
"Status": "APPROVED"
}

If you create a Merkle tree based on this document, it will have the following structure:

As a side note: The above example shows the last leaf sits one level higher than the other leaves because of an uneven number of total leaves in the tree.

A standardized proof format

To create a proof for the field DueDate, one creates the following Merkle proof:

{
Property: "DueDate",
Value: "2018-07-01",
Salt: "YTEyMzEyMzEyMzFhc2RmYXNkZjIzMTIxMjMxYXNkZjMyMTQ5M2FzZGZ",
Hashes: [ {"left": hashC,}, {"left":hash1}, {"right": hash5}]
}

To validate the proof, first, the hash of the field (hashD) is calculated by concatenating field, salt & value and then hashing it. The salt is added to the value to prevent users from using a rainbow table to find out the pre-image of a hash. The hash is concatenated with hashC and hashed again, the tree is traversed until the final hash can be compared with the Merkle root. By comparing the calculated root to the provided root, it’s evident that the value for the field is indeed correct.

A Few Important Considerations When Developing a Standard For Proof Generation

Schemas Are Important

Without knowledge of what the schema looks like, a proof can be deceiving. Here’s an anti-pattern:

{
ApprovalDate: "2018-07-01",
ApprovalRevocationDate: "2018-07-02"
}

A proof that ApprovalDate is 2018–07–01 would not relay information on whether the approval has been revoked or not. Without knowing the schema of the document, simply requesting a proof for the approval date is not enough. Therefore documents must have a predefined schema that is known to all parties.

Dot Notation & Tree Layout

The hash for a field must contain the dot notation of the property name along with the value and the salt. Providing the property name in the proof makes it very easy to know what field the proof is for. Due to the variable length and format of the tree you can not deduce the property name from the position in the tree. Without adding the property to the hash, it would be impossible to know which field the proof is for.

Hash(property+value+salt)

The property name is described using the dot notation:

{
Property: "lineItems[2].Amount",
Value: "$100",
Salt: "YTEyMzEyMzEyMzFhc2RmYXNkZjIzMTIxMjMxYXNkZjMyMTQ5M2FzZGZ",
Hashes: [ {"left": hashC,}, {"left":hash1}, {"right": hash5}]
}

Field Order

Preserving field order is important to make sure a Merkle tree gets constructed the same way every time. Therefore there is no support for unordered lists. Maps are always ordered alphabetically. Field labels are case insensitive and should always use lower case spelling.

Null Values

The absence of a value has to be verifiable as well. Therefore null values, empty lists etc. are also added as leaves to the tree.

List Values

To be able to prove that the list is X elements long or that the number of items shared is the full set of items, we also add the length to the tree:

{
Property: "lineItems.length",
Value: "2",
Salt: "YTEyMzEyMzEyMzFhc2RmYXNkZjIzMTIxMjMxYXNkZjMyMTQ5M2FzZGZ",
Hashes: [ {"left": hashC,}, {"left":hash1}, {"right": hash5}]
}

Using Precise Proofs

First Go Implementation

Along with this article, we are releasing the first version of an implementation in Go that supports creating & validating proofs for arbitrary structs. The library is in active development and so far only a limited feature set has been implemented. Most notably, support for nested documents and lists is not yet available in the library but is in the works.

A simple example of how to use the library:

Outlook

Sharing only subsets of documents in a way that can be verified for authenticity is a central part of Centrifuge OS. The precise-proofs library is the first implementation of these proofs and will be extended with support for nested data structures and more data types. In a follow up to this article, we will talk more about how we use Ethereum to store the Merkle roots and thus being able to verify these proofs against a public ledger.

We invite you to take a look at the library, play around with it, submit an issue or pull-request on GitHub or reach out to me at lucas@centrifuge.one any time if you have any questions.


Do you like what we’re working on? Do you think B2B software is ready to be moved into the decentralized world? We are hiring smart engineers who share our vision of building a decentralized platform for the global financial supply chain. http://jobs.centrifuge.one/