Tangle Pathway

A space-efficient sub-graph description of IOTA’s tangle.

How best to describe what I am trying to by quoting the problem statement from Serguei Popov?

From the #tangle-math discord channel on 20–11–2018:

Serguei Popov [IF]11/20/2018: Let me call it “semipermanode problem”. It is of course difficult to store all the transaction since the beginning, since it requires a lot of space. On the other hand, many of those transactions need not be stored, because they are almost/completely useless (e.g. spam). Assume that you want to store only a specific set of transactions: say, the historical data from the temperature sensors captured in the center of Munich (which is of course important and needs to be stored forever since you believe that many people or entities will need it in the future). To prove that that data is genuine, you don’t need to store the whole Tangle; you just need a connected (to the “present time”) subgraph of the Tangle which contains the transactions of the interest. So, question: how to efficiently build-and-maintain connected subgraphs which contain “interesting” information? Also, we may need to have some “quick time-traversing chains” (so that it’s possible to go “quickly” to the past along them), but those can be issued by the semipermanode owners. And, the semipermanodes can charge something for access to the data. “In Bitcoin, you pay for transactions, but accessing the ledger is free. In IOTA, the transactions are free, but you may have to pay for accessing the ledger (in case you don’t want to store it yourself)” — this looks as a reasonable general principle.

The challenge consists of multi sub-problems, one of them is actual data-storage. For that I started looking more into the inner workings of IRI and created a couchbase database adapter to have more fine-grained control over the ‘permanence’ of data storage. You can read the article here.


As Dr Popov’s says is that there is a need for connected subgraphs and this is the focus of the encoding mechanism I am describing here.

Specifically: “A space efficient path description”

The space efficient part in here is extremely important. Because if you want to store a specific transaction it would be ridiculous if you need 100x the space to describe what you actually want to store.

A requirement as well is that given a node that wants to obtain the data can request and validate the transactions immediately from a node holds the data (with a connected subgraph).

To achieve this I identified 4 unique descriptors that can describe the entire tangle. The benefit of just 4 descriptors is that each descriptor only requires 2 bits:

  • B[0,1] = follow the branch
  • T[1,0] = follow the trunk
  • Y[1,1]= follow the trunk, remember me.
  • E[0,0] = End, go to latest remembered Y, follow the Branch.

The encoder must keep track of which transactions it ‘visited’ in order to prevent walking the paths multiple times.

Example sub-graph(left=trunk, right=branch): yEyyttbEbEytEbbE

Given a start transaction the description will look as followed (I use the letters for explanation but in reality these are their bit representations)


To give a visual idea on the efficiency of this encoding please look through the end of this video: https://www.youtube.com/watch?v=iOuwzFS4AiY

The path that is being walked in this video is described below and decodes 2698 nodes(so *2 edges) in just 709 bytes. (However given the re-occurrence of certain patterns on large graphs, gzip compression could even further reduce the storage requirements.)

See the first lengthy ‘yEyE’ = the long tail in the beginning of the video.


(note that there are no B and T because it is a full description of the graph with all trunk and branch transactions loaded not only a path)


Discord: Olaf van Wijk#1273

Twitter: https://twitter.com/@ovanwijk