Storing a large amount of arbitrary data on Ethereum efficiently

James
Coinmonks
Published in
4 min readDec 20, 2022

--

Gas is always a big topic in smart contract development. Every bit of data stored on the blockchain costs gas. We prefer not to store data on chain unless it is necessary. There are cases where we need to store a lot of data. For example, we need to store the whitelisted addresses on chain for whitelist NFT minting.
LooksRare employs a merkle tree approach to “store” those data on chain. The data are actually not uploaded to the chain, but we uploaded the proof to verify if the data submitted by users are valid.

    /**
* @notice Check whether it is possible to claim and how much based on previous distribution
* @param user address of the user
* @param treeId id of the merkle tree
* @param amount amount to claim
* @param merkleProof array with the merkle proof
*/
function _canClaim(
address user,
uint8 treeId,
uint256 amount,
bytes32[] calldata merkleProof
) internal view returns (bool, uint256) {
// Compute the node and verify the merkle proof
bytes32 node = keccak256(abi.encodePacked(user, amount));
bool canUserClaim = MerkleProof.verify(merkleProof, treeParameters[treeId].merkleRoot, node);

if (!canUserClaim) {
return (false, 0);
} else {
uint256 adjustedAmount = amount - amountClaimedByUserPerTreeId[user][treeId];
return (true, adjustedAmount);
}s

To verify if the user is a whitelisted user with the allocated amount, only one variable called merkleRoot is needed to be stored on chain.

When users make a claim, they need to provide a merkle proof, along with the allocated amount to claim and the user’s address. The contract will then verify if the hash of the above contents matches the merkle root. If they matches, it is proved that the user is a whitelisted user with the correct amount of allocation.

Let’s dive deep to see how it works…

Generating the merkle root

We ignore the allocated amount for now for the sake of simplicity. A merkle root is calculated by hashing lead nodes (keccak of an address in this case) recursively. After obtaining the merkle root. We store it in the smart contract. That costs way less gas than storing all of the leaf nodes.

Creating root hash by hashing leaf nodes recursively

Verifying merkle proof provided by the user

Here’s where the complexity lies in. The user has to submit a merkle proof, which is generated off-chain by the service provider. The contract will hash the user’s address along with the merkle proof to see if it can obtain the same merkle root. Since it is very hard to generate a merkle proof with a custom input. If the results match, one can confidently say that the input is valid.

Verifying an address with the corresponding merkle proof

Going Further —Also storing allocated amount

Instead of having users’ addresses as leaf nodes, we can do more. For example, storing the hash of user’s address and allocated amount in the leaf node. In that way, we can store both address and amount in the same node, but still generating one merkle root only.

The users will need to submit one more thing — allocated amount. The contract will first hash the user’s address with the corresponding amount, then hash along the merkle proof to check against the merkle root.

Verifying address and allocated amount with the corresponding merkle proof

In theory, we can store anything that can be hashed and fits in a leaf node.

Possible use cases:

  • Storing an prediction by hashing the statement and put it in a leaf node
  • Storing more complex data structures, like {address, amount, time, criterion1, criterion2, …}

--

--