Scaling a Blockchain vs Scaling a Tangle

Pointing out possible weaknesses in the DAG scalability

On its face, the DAG (Directed Acyclic Graph) design used by distributed ledgers like IOTA and Byteball, looks like an amazing innovation. Instead of bulky blocks used as transaction containers in existing blockchain designs, the DAG builds a graph of transactions which reference older transactions and thus can confirm transactions immediately when they are received by a node instead of having to wait for the next block.

Anyone who tried to use a blockchain, will quickly appreciate that fast confirmation times of transactions, is an advantage over having to wait for transactions to be grouped into a block before being able to rely on their state or balance changes.

And indeed the DAG confirms transactions quickly as long as the DAG node already knows about the two transactions approved by this transaction. But synchronizing the state between nodes seems to be a major issue for existing DAG implementations, for example, IOTA currently relies on a single coordinator node while Byteball relies on 12 witness nodes all controlled by the developer himself to checkpoint the state of the DAG.

But why is the synchronization of a DAG more difficult than the synchronization of a blockchain? Simply put, since the blockchain state is modified by every block while the DAG state is modified by every transaction.

As someone who spent endless hours working on blockchain load testing and scaling, I can testify that decentralized networks operating under heavy load will naturally generate forks even when all participants are honest and work by the protocol rules. I can also testify that switching to a better fork is one of the most heavy operations a node needs to perform since it has to undo changes made by existing transactions in the popped off blocks and apply the changes made by the transactions included in the blocks of the better fork.

However, I claim, that an advantage of a blockchain over DAG is that a blockchain is only sensitive to the order of blocks not to the order of transactions and this fact makes it no less scalable than a DAG.

As I explained in my previous post about scaling, in a distributed network, different nodes will see transactions at different times and in a different order, and the higher the rate of transactions the more frequent these ordering differences will occur.

For a blockchain this is not a serious problem since block generation is made artificially difficult thus limiting average block generation frequency to 15 seconds on Ethereum, 1 minute on NXT/Ardor and 10 minutes on Bitcoin. This means that even under heavy load, forks caused by conflicting blocks are infrequent and when they do occur, nodes has time to perform the large processing required to switch to a better fork in order to synchronize the state with another node and before yet more blocks are generated.

In a DAG implementation however, the latency of transaction propagation and the possible order differences is causing several problems while transaction throughput increases and nodes start getting transactions submitted by other nodes for which one or two of the approved transactions are not yet known to them and therefore will be unable to add them to the DAG.

DAG advocates state that you have to have a large number of simultaneous transactions in order to remove the coordinators and whiteness nodes requirement, in response, I would like to point out several weaknesses in the DAG implementation which I predict will surface on a DAG network which accepts 1000 simultaneous transactions per second and perhaps much less.

In a 1000 TPS network, when a remote DAG node receives transactions, say, a second after they were submitted (recall that the speed of light is final and that the internet operates much slower than the speed of light) this remote node will already lag 1000 transactions behind other more central nodes in the network, this will cause several problems

(1) This node is almost certain to not be able to process some of the transactions immediately because of missing approved transactions seen by the submitter node, which still propagate through the network, thus will have to keep them in a possibly ever growing unconfirmed pool which will drain the node resources.

(2) When finally all approved ancestors of a sub tangle of a transaction will arrive at the node, these transactions may approve transactions which are already buried many levels of nesting behind the current tips of the DAG and when the load reaches some tipping point the DAG will start cloud like expansion in all directions with ever growing number of tips representing an ever growing number of transactions without approval.

(3) Assuming a transaction is executed when it’s added to the DAG. Transaction execution order will differ significantly between nodes thus rendering the DAG useless for some applications which require some guarantees about order of execution like votes in polls which need to arrive before the poll ends.

(4) This will also prevent pruning of the DAG since there won’t be a stable state which nodes can snapshot (in the absence of a single coordinator node) in order to prune all prior transaction history.

To summarize, I suspect that the ability of a DAG to confirm transactions as they arrive causes it to become more sensitive to transaction propagation latency and the order in which transactions arrive at a node, which under load may cause accumulation of unconfirmed transactions and growth of the DAG into a cloud like shape where acceptance of transactions referring to old tips is not an exception but the norm and ever increasing number of tips remain unapproved.