A simple and efficient storage structure for arbitrarily large data sets with selective disclosure of contents and proof of authenticity.

Rob Hitchens
Sep 9, 2018 · 7 min read

… and with a method for designing smart contract logic
contingent on proven external states.

“person holding passport” by Agus Dietrich on Unsplash

In case the hash concepts are new, check out this gentle introduction: https://simple.wikipedia.org/wiki/Cryptographic_hash_function

Generally-speaking, it’s a bad idea to store anything too large in a blockchain owing to the high cost of storage itself. A common solution to this problem is to store only the hash of a large object in a smart contract, and then store the bulk of the object elsewhere.

Without delving into specific use-case details, suffice it to say that a smart contract can store a great deal of information about an object’s origin, lineage, and the approval/on-boarding process that led to the on-chain immortalizing of a certain hash that everyone will recognize as “authentic”. An observer in possession of an object simply hashes the object and then uses the smart contract to confirm authenticity and to inspect other interesting details.

The following description addresses a limitation of the common approach described above.

Problem:

Imagine Alice has objects such as 1) a driver’s license, 2) a passport and 3) a birth certificate. Each document is represented by a JSON object that includes all the data fields present. Each such JSON object is issued by an authoritative registry, and we can imagine a smart contract that records the hashes of authentic documents. Nothing gets on that list without following the strict on-boarding process of the contract.

Imagine Alice wants to show Bob her date of birth without disclosing any further personal details.

  1. Since each form of ID contains a date of birth, she has three possible ways to reveal this detail (her birthday) to Bob.
  2. Since each form of ID is easily authenticated by a hash on a blockchain, Bob would be able to consult the appropriate registry which would confirm the authenticity of any document Alice decided to reveal.
  3. For this to work, Alice would have to give Bob the entire object. Alice faces a dilemma. She has three objects to choose from, but each object contains more information than Alice wants to reveal, for example her home address, her travel history and her place of birth depending on whether she chooses to reveal her complete driver’s license, her entire passport, or her complete birth record.
  4. Alice can say these documents contain a certain date of birth, but Bob cannot independently prove that Alice is telling the truth unless she reveals one of these documents in every detail, which Alice doesn’t want to do.
  5. In fact, Alice can’t prove anything without selecting one of the documents and disclosing it to Bob, in its entirety. Again, Alice doesn’t want to do that.

How can Alice prove she is telling the truth about a single field? How can that be done without migrating all of the important fields to expensive on-chain storage in the registry contracts?

Solution:

Consider a simple JSON object

{
“name”: “Alice”,
“dateOfBirth”: “01/01/1984”,
“placeOfBirth”: “Oceana”
}

Alice will prove that the document contains “dateOfBirth”: “01/01/1984” without disclosing the entire JSON object by using something called a Merkle Proof.

A Merkle Proof proves the existence of a value within a dataset, which is exactly what we want. If you’re not familiar with Merkle Proofs, have a look over here: https://www.quora.com/Cryptography-How-does-a-Merkle-proof-actually-work

For a Merkle Proof to work, we need a Merkle Tree. A Merkle Tree recursively hashes pairs of values until finally there is only one hash left, known as the Merkle Root. Conveniently for us, we start with pairs of keys and corresponding values, which is actually two things.

Alice proves the birthday claim is authentic by providing the root hash (document ID) and two more hashes.

In case the proving process isn’t clear, Alice wants to prove that “dateOfBirth”: “01/01/1984” exists in an authentic document.

  1. Starting with that pair, anyone can derive the hash of the two values she says are there (CD).
  2. Alice also supplies AB as part of her proof.
  3. With this extra information, anyone can compute ABCD.
  4. All that’s missing is EF, so Alice supplies that as well.
  5. Now, anyone can compute the root hash, which is the unique identifier for the document in the on-chain registry.
  6. The registry can confirm that document ID (Merkle Root) is an authentic document.
  7. So, Alice has convincingly demonstrated that there is a certain key:value pair in a verifiably authentic document.

Since we know that hashes are one-way functions, we know Alice could only produce such a formulation by working from knowledge of the contents of the original document. There is no known economically viable method of imitating this sort of mathematical demonstration of knowledge of the contents of the document.

Consider what would happen if the topology and values of the Merkle Tree was part of the JSON object itself. Simplified:

{
“merkleTree”: [
“0x123…”,
“0xabc…”,
“0x456…”,
“0xdef…”],
"name”: “Alice”,
"dateOfBirth”: “01/01/1984”,
“placeOfBirth”: “eurasia”
}

The root node is a suitable key to include in a blockchain because it sums up all of this in a single 32-byte word. It takes up the same amount of space as a simple hash of the object, but it’s more useful.

Since the correct construction of a Merkle Tree is self-evident given a certain set of data, it can be solved on-the-fly. There is no need to actually stuff the Merkle Tree details inside the objects.

The main requirement is to organize the details you want to be separately and selectively disclosable into Merkle Trees and rely on the Merkle Root instead of simple document hashes when using smart contracts to authenticate off-chain objects.

Now, Alice can construct a Merkle Proof showing that there is a field:value pair “dateOfBirth”: “01/01/1984” in the document with a certain ID. Importantly, she does not need to reveal the entire contents of the document to the observer. For the observer, it’s enough to see that it is a legitimate piece of ID, confirmed by a proper authority (which is a smart contract), and the data set does indeed contain the one piece of information (key:value pair) Alice wishes to disclose.

Hashes are one-way functions, so a Merkle-Proof doesn’t leak or even hint at the contents of the rest of the document.

Encrypting the Source Document

Astute readers will have noticed that up until now, Alice’s document has been exposed (See? It’s right up there ^). Consider what happens when the document is encrypted so that only Alice can decrypt it, or protected so only Alice can see it.

Alice herself has no problems constructing Merkle Proofs for any key:value pair she cares to disclose to others. Observers can see that she is working from authentic source documents and providing Merkle Proofs for everything she says. Even so, document contents are out of reach to others unless Alice decides to reveal them.

In effect, Alice would be proving:

“I have a birth certificate with this unique identifier. You can confirm authenticity by checking the birth registry smart contract yourself. I have access to the details of this birth certificate because it belongs to me. On this birth certificate, the “dateOfBirth” is “01/01/1984”. Here’s a Merkle Proof. Your own mathematicians will confirm there is no viable alternative explanation for its existence.”

In summary

This method facilitates the efficient storage of objects of any size and facilitates the selective disclosure of discrete details of the contents of such objects.

This method is agnostic about storage infrastructure and blockchain of choice. We use blockchain smart contracts to register authentic documents, describe document origins, provide the history of the issuance process, the signers, and so on. Merkle Roots can be used as unique identifiers for documents or document versions or as attributes of documents known by some other key.

We use inexpensive persistent data stores to store the bulky details. We store the Merkle Root’s of authentic documents on blockchains instead of document hashes. This method supports selective disclosure of parts of the objects at the key:value pair level.

This method is compatible with any encryption or access-control scheme aimed at keeping the source document out of public view. The only requirement is that the prover must have access to correct information. In the simplest scenario, Alice can see the entire document, so she can compute the Merkle Tree and construct proofs of its contents. It’s not even strictly necessary that Alice can see the whole document, provided she has the extra node hashes she needs for her proofs.

Onward

This method potentially allows smart contract logic to access state information stored off-chain. A state change (or other logic) can be contingent on the user providing a Merkle Proof of a value that is not, itself, stored in the expensive smart contract state.

In Ethereum (Solidity), it might look something like this (pseudo):

modifier only18Plus(uint dateOfBirth, bytes32[] proof) {
require(isMerkleProof(“dateOfBirth”, dateOfBirth, proof));
require(dateOfBirth <= eighteenYearsAgo);
_;
}
function forAdultsOnly(uint dateOfBirth, bytes32[] proof) public only18Plus …

The real world is not quite so simple but it is interesting to consider that contract logic could be conditional on presenting proofs of external states.

Update

As luck would have it, Open Zeppelin now has a Merkle Proof example, here: https://github.com/OpenZeppelin/openzeppelin-solidity/blob/v2.2.0/contracts/cryptography/MerkleProof.sol

Rob Hitchens is a Canadian smart contract design consultant, co-founder of Ethereum smart contract auditor Solidified.io and a courseware co-author and mentor of Ethereum, Hyperledger Fabric, Hyperledger Sawtooth Lake, Corda, Quorum and Tezos bootcamps by B9lab.

Rob Hitchens

Code Patterns tutorials for Smart Contracts

Rob Hitchens

Written by

https://about.me/hitchens

Rob Hitchens

Code Patterns tutorials for Smart Contracts

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade