Rollup Diff Compression

Published in

Privacy & Scaling Explorations

3 min readJul 13, 2021

Application level compression strategies to reduce the L2 data footprint on L1

Building on previous work of @barryWhiteHat about rollup diff compression, we have investigated how the data footprint of a rollup can be further compressed for the airdrop use case. We used Reddit’s airdrop as an example.

Below the tl;dr you can read the full text here

Problem description

Commitments sent from L2 to L1 are usually already highly compressed by the rollup system. We refer to this as compression on the state level.

A commitment in this context is a collection of L2 transactions for a certain time interval (we can think of this as a block) along with the previous state merkle root and the new state merkle root of the L2 chain.

What we’re focusing on is how can we further reduce the commitment size for airdrops or any similar use case, where the business logic of the application is sending large amounts of similar data (amounts) to large amounts of users (addresses). In other words we have explored how we can further minimize the commitment size by compressing the data on the application level.

Methodology

The size of the airdrop data can be significantly reduced by using standard data compression techniques. We’ve experimented with batch minting, data grouping, representation reduction and bitmap compression algorithms (based on Run-length encoding). These methods produce linear reduction in size of the result, relative to the input data. We have explored the idea of achieving constant compression, where the result is independent of the size of the original airdrop data, by referencing previous rounds of airdrop distributions.

Case study

As a case-study for this problem the Reddit community points airdrop data for the r/CryptoCurrency and r/FortNiteBR subreddits was used. It is an example for complex, multiple round airdrop distribution with a lot of samples.

Reddit’s airdrop is essentially a way to award the Reddit users of particular subreddits, for their activity on the platform. After the end of each month, each user which has earned some karma points in the previous month is eligible to claim ERC20 tokens on the Rinkeby testnet.

We’ve used the data from the first 13 months/distributions of Reddit’s airdrop.

Results

By using batching and data grouping, as well as strategies involving run-length encoding compression, we’ve managed to achieve linear compression of 61.68% (r/FortNiteBR) and 50.89% (r/CryptoCurrency) relative to the size of the original dataset data. Reduction that yields constant results relative to the input data is not feasible for the dataset we have explored currently, but long term it might be possible under probabilistic assumptions. Current findings are that by using data compression algorithms the commitment data for certain use cases can be reduced by large amounts which contributes to reduction in gas usage and overall cost reductions. There are still a lot of things unexplored such as strategies and algorithms for size minimization and exploring other use cases and applications eligible for this kind of compression.

Conclusion

The conclusion is that by using batching and data compression strategies, as well as using RLE algorithms, it is possible to decrease the app data size (and thus gas usage) drastically. Also, by relying on probabilistic assumptions in the long term, compression that produces constant size results might be achievable.