Compacting ERC20 Logs

Thomas Jay Rush
Coinmonks
7 min readDec 14, 2021

--

I have zero time, so I herein produce a very quick take on something I’ve been thinking about for a long time. This is more a thought experiment than anything else.

Fact: Very, very, very many of the transactions on the Ethereum blockchain are simple ERC20 token transfers.

So what?

What If ERC20 Tokens were Native to the Chain?

Every token transfer that appears on the chain using the standard ERC20 token transfer function (or transferFrom, which we see below can be safely ignored).

The transfer function is defined in the ERC20 standard as:

which translates into a four-byte signature of 0xa9059cbb.

Looking at a typical ERC20 transfer gotten from the RPC, it looks something like this:

And, the log generated by a transfer looks like this:

If you look closely at this there’s a huge amount of redundant information.

Coloring Stuff Makes it More Obvious

Looking at that some gobblygook bytes colored to make it more clear, gives:

We see…

for the transaction and…

for the log. Notice anything?

Every piece of the token transfer’s receipt is already present in the transaction data except status.

Here’s an Idea

What if we removed the redundant information from the events that the standard requires and turned the log for transfers into a special case native to the chain.

The log for a transfer could look something like this, for example:

In other words, all we really need to know about an ERC20 token transfer is whether or not it succeeded. All the other information is already included in the transaction.

Obvious Objections

It’s too hard, to change now. All the dApps would break.

This is true. A lot of things would break. But I’m suggesting we think forward to the next 50 years, not the last five.

It’s too hard, all the client code would have to be modified.

This is also true. See my previous response and see below where I make a quick back-of-the-envelope calc on how much space this might save.

It slows down queries.

This is also true. It would slow down queries because the node software is highly optimized for delivering log-related queries.

Currently, a dApp sends the transaction and then queries for the log and, as a result, this is why the redundant information is probably included. Presumably, though, the dApp was the source of the transaction, so it already has the information needed to reconstruct the full log.

In the case of an after-the-fact, off-chain scrape the transactional information is most likely available already as well because many off-chain scrapes will scan the blocks and scan the transactions before scanning particular logs.

Benefits

The size of data stored by a node is decreased.

Even if this doesn’t make it into the protocol level by becoming a special case native primitive, the observation leads us to conclude that we could greatly reduce the size of the data on the machine’s hard drive. We’ve calculated an estimate below using very rough back-of-the-envelope calculations.

The number of total bytes transferred over the wire is cut in half.

The amount of space “on the wire“’” is infinite — isn’t it? What does high traffic even mean?

It means there are too many bytes trying to jamb their way onto the wire. So every little bit counts, and this could lower the number of bytes going over the wire significantly.

Smaller data means more “regular people” can run nodes.

The whole goal of everything I do is to make running a local node easier. One of the biggest complaints about running a node is how much disc space it takes up. This would lower that amount and thereby allow more people to run more nodes.

How Much Space Might This Save?

We ran the following commands from the TrueBlocks command-line tool chifra:

This produced a file (file.txt) with data from around 24,000 transactions extrating only the input fields. This represents about 200 blocks randomly sampled across blocks between 3,000,000 and 13,000,000.

That data looks like this:

Not amazing, but fairly interesting.

We ran the following command against that data file and found 1,578 different four-byte codes in the 24,379 records with 10 of them showing more than 100 transactions with that four-byte.

Using the Ethereum Four Byte Directory, we find (for 10 most frequently appearing functions) this information:

Or, stated as percentages:

So 91% of all the transactions we sampled were either a straight-up transfer of ETH or an ERC20 token transfer.

Hand Waving

We ran the following command against the same set of blocks:

and summed the result to find that the 200 blocks we sampled take up 5,132,592 bytes (5 MB) on the hard drive. Extending that out across the 13,800,429 blocks at the time of this writing, we get an estimated size for just the blocks alone at 5,132,592 * 13,800,429 = 354,159,857,410 bytes or about 350 GB for the block data alone.

A very rough guess is that there is as much log data (which isn’t stored as part of the blocks) as there are blocks, and if we add 350 GB to 350 GB we get 700 GB which is on the order of magnitude of the known chain size (2TB).

So, let’s use 350 GB as the size of just the logs.

Extending 350 GB * .3836 * .1 (because we can decrease the size of a transfer log to 1/10 its current size) we get 13.5 GB. Is that a lot? Not really….

We could if we replaced all the transfer logs with a simple boolean showing success or failure and picked up the remainder of the data from the transaction that spawned the transfer, decrease the size of the data on the hard drive by about 15 GB or 1% of the total (assuming 1.5 TBin total).

Conclusion: Not worth the effort!

→[Correction — 12/30/2021]

I made a mistake in the above calc. It should have used a value of .9 not .1 since we are decreasing the size to 1/10 its original size. So 350 GB * .3836 * .9 would save 120.834 GB . That’s actually pretty much, so different conclusion. Might be worth it.

→[Correction — 12/30/2021]

Support Our Work

TrueBlocks is totally self-funded from our own personal funds and a few grants such as The Etheruem Foundation (2018), Consensys (2019), Moloch DAO (2021), and most recently Filecoin/IPFS (2021).

If you like this article or you simply wish to support our work go to our GitCoin grant https://gitcoin.co/grants/184/trueblocks. Donate to the next matching round. We get the added benefit of a larger matching grant. Even small amounts have a big impact.

If you’d rather, feel free to send any token to our public Ethereum address at trueblocks.eth or 0xf503017d7baf7fbc0fff7492b751025c6a78179b.

Join Coinmonks Telegram Channel and Youtube Channel learn about crypto trading and investing

Also Read

--

--

Thomas Jay Rush
Coinmonks

Blockchain Enthusiast, Founder TrueBlocks, LLC and Philadelphia Ethereum Meetup, MS Computer Science UPenn