Why Dagger-Hashimoto for Ethereum?

Verify.as
verifyas
Published in
6 min readOct 29, 2017

--

This is the ninth in a series of posts where we discuss the core concepts behind the Blockchain, Bitcoin and Ethereum. At Verify, we’re building a reputation protocol on the Ethereum blockchain and are sharing these posts in an effort to share our knowledge with the wider crypto community.

We have seen that the proof of work in bitcoin is about computing power. This lead to the rise of application-specific integrated circuit (ASIC) machines. ASICs are specialized hardware designed specifically for a single purpose, in this case mining bitcoins and nothing else.

ASIC mining machines were made with a focus on CPU and parallelism. The machines did one thing only and that was hash256. That made it impossible for ordinary computers to compete or mine in bitcoin, resulting in greater centralization around groups of miners that had the resources to purchase large quantities of these ASIC rigs. Ethereum wanted to solve this mining centralization problem.

With Ethereum, the proof of work is still generated by how fast you reach a target; however with limiting ASIC machines in-mind. They wanted ordinary machines to participate in mining and inherently reduce any possible influence ASIC machines could have on the network. The way they tackled that issue is by adding memory to the mix. So you need a lot of memory to generate a valid hash in Ethereum. Since ASIC machines are essentially designed to get several CPUs working on parallel sharing the same memory, they are unable to cope with the memory intensive operations Ethereum demands. This renders ASIC machines almost useless for Ethereum mining. This does not mean that there can’t be ASIC machines that would be able to generate a valid hash; it just means it will be much harder to make them (ASICs with plenty of memory are essentially CPUs).

In Ethereum, regular users can just get off the shelf memory sticks and plug them in their machines; for ASIC manufacturers they can still add-in memory sticks but it will end up costing more.

It is also argued that by allowing commodity hardware you are basically allowing more attacks to the network; in reality this has not happened, it seems miners prefer operating legitimately by the rules Ethereum’s designers have set.

The initial hashing algorithm used was supposed to be “dagger” algorithm that uses part of memory and fills it with random data generated from the nonce and header of the block. The data is processed in rounds for each round you include data from previous round. Dagger came from the name DAG, which stands for Directed Acyclic Graphs. DAGs are like directed trees (basically the connection between nodes have a direction_arrow) with an extra feature. A node can have multiple parents.

So the direction in a DAG is never circular meaning it only goes one way you can’t go back to the same node using the same path. That is what dagger is all about; using the previous data going only forwards, generating newer data. To achieve this you need to keep data in memory rather than storing it in disk and reading it every time.

Even though Dagger was memory hard algorithm it did not alone meet the main goal that is being ASIC resistant. It is somewhat impossible to be 100% ASIC resistant but you can make it hard and being memory hard is just one part; you still need a way to tackle the CPU parallelism part as well (this was brought up by Sergio). Dagger did not tackle parallelism and for that reason it was dropped.

So there was a need to make further modifications to dagger; then came dagger-hashimoto. Dagger- what? Let me explain hashimoto first.

To understand what it does let us start with its name. Hashimoto is actually: hash, shift and modulo which are the main operations in the algorithm. Hashimoto algorithm is meant to be I/O bound to resist ASIC machines.

The way proof of work in bitcoin was generated is using sha256 over the nonce, previous_block_hash and merkle_root. When Thaddeus (author of hashimoto) examined how the proof of work was generated he realized that the sha256 was where he has to make the modifications since the other parameters can’t be controlled; meaning the previous_block_hash is fixed and can’t be changed, the merkle_root depends on the transactions and nothing can be done there and for the nonce it changes to generate hash. Those are all important and can’t be changed.

ASIC machines execute sha256 very effectively. So that was the starting point for the algorithm. So Thaddeus’s proposition was to add another step to the generation of proof of work. When a hash is generated using the previous_block_header, merkle_root and nonce it is used to generate another hash and that hash is what is compared to the hash target i.e if it is a successful hash you will have a proof of work.

Let us consider:

hash_1 = sha256(previous_block_header, merkle_root, nonce)

with hash_1, transactions are uniformly selected from the blockchain (think random with equal probabilities of selection on each transaction); Thaddeus used 64 transactions.

The second operation in Hashimoto is doing some shifting followed by modulo operations. So the hash_1 would be shifted right by one bit and increments every transaction. this will result in a new outcome which we will call “shifted_hash

N = 64 #number of transactions
tx_ids=[]
for i in range(N):
shifted_hash= hash_1 >> i #shift i bits to the right

Next, the modulo of the shifted number will be calculated to choose a block number to get transactions from.

block = shifted_hash mod number_of_blocks #get a block from shifted_hash

the transaction is chosen the same way we choose a block; except this time we will use the block’s total number of transactions, let us call it all_block_transactions.

transaction = shifted_hash mod block.all_block_transactions

now for that transaction we obtain its id and shift left one bit. We keep that in the list tx_id[]

tx_ids.append(get_txid(transaction) << i)

now the tx_ids[] has all N transactions. the next operation would be to XOR them together

tx_ids_all_XORd = functools.reduce(lambda i, j: int(i) ^ int(j), tx_ids))

now tx_ids_all_XORd would be used as input to generate the final hash. the nonce would be shifted left to the most significant bit.

tx_ids_all_XORd and with shifted_nonce would be XORed to generate the final hash.

final_output = tx_ids_all_XORd ^ shifted_nonce

The final_output is then compared with the target. Smaller values are accepted as proof of work.

By using data from the blockchain this would mean that the mining node should have the full dataset/blockchain in the node and not in some shared central node. Even though Hashimoto addressed issues related to ASIC machines it had a computation overhead given it uses 256 bit Arithmetics.

Dagger-hashimoto came to merge the best of both worlds:

1) the DAG approach to generate data instead of the blockchain

2) approach of hash, shift and modulo of hashimoto.

Instead of the blockchain used in hashimoto, dagger-hashimoto uses custom generated data-set of 1 GB and that dataset updates every N blocks. The data generated is generated using Dagger algorithm. This approach eliminated generation of data on every block that was in dagger algorithm. Making it more memory oriented than computation oriented.

Now Dagger-Hashimoto has undergone several improvements and basically evolved to what is now called “Ethash”. We will be exploring Ethash in future posts. Stay tuned.

Here is the python code of Hashimoto described earlier:

N = 64 
tx_ids=[]
hash_1 = sha256(previous_block_header, merkle_root, nonce)for i in range(N):
shifted_hash= hash_1 >> i #shift i bits to the right
block = shifted_hash mod number_of_blocks
transaction = shifted_hash mod block.all_block_transactions
tx_ids.append(get_txid(transaction) << i)
tx_ids_all_XORd = functools.reduce(lambda i, j: int(i) ^ int(j), tx_ids))
final_output = tx_ids_all_XORd ^ shifted_nonce

--

--