Tsunami in Lightning Network and half of the network soon disabled?
I have previously investigated Lightning Network by performing health checks to see which nodes are actually there. TL;DR is that 2/3 of the nodes are zombies. I guess you could call that a 67% attack against the Lightning Network, but it still works quite fine and payments are routed. The different routing algorithms in the implementations all seem to be able to push thru payments most of the times.
There are some theoretic analysis saying there is a certain percent of getting a certain amount of money from point A to point B, but most of those studies include nodes that have only connected with test-channels to try this new stuff out.
Just like if you are curious about fiat money and get a prepaid VISA-card and deposit 1 USD into it, you have an extremely low chance of paying for a coffee for 2 USD. That doesn’t make you say that USD will never work for buying coffee.
As Andreas M. Antonopoulos states, the routing discovery is not even part of the LN specification, and implementations are free to use whatever algorithm they see fit. For now the routing is source-based only, so the sender decides what routes to try. Dijkstra and Bellman-Ford used on local view of the network is most common approach. This video is a bit old, but still relevant and a must-see. Fast forwarded to 8m16s for source-routing information, but really, please watch the entire video.
Layer1 and Layer2
The bitcoin blockchain and lightning are both peer-to-peer networks where information is spread to all nodes.
In Layer1, the blockchain network, every single coffee bought is spread to all nodes, verified and recorded for all eternity. That is of course very useful if you or anyone else is interested in your coffee consumption habits.
By adding a second layer on top of the blockchain, you only have to store the opening and the closing of a channel on the blockchain. No matter how many coffees you buy, nobody except you as a customer knows how much coffee you buy. The seller doesn’t either, unless he does a KYC on you every time you buy a coffee. If you absolutely still want everyone to know about your coffee habits, you can use third party solutions such as facebook.
I’d like to think of the blockchain as a bunch of big fortified hangar ships, where the smaller and faster aircrafts can take off and land.
Roger Ver , a fan of Lightning Network, compares LN to flying cars that will surpass the trains in Tokyo.
There are many other analogies, and more and more cryptocurrencies are looking at Lightning Network as their Layer2 solution. It is not reserved for BTC, but a common network for all currencies that want to join.
It seems there is a consensus in all parts of the crypto sphere that Layer2 is needed and needs to mature more, whereas Layer1, the blockchain, should be considered as a distributed database reserved to only long/medium-term storage of larger values where you can wait for blockchain confirmations.
Now back to the subject. I thought it would be time to investigate the gossip protocol used in the Lightning Network. I was curious what i would find. I was even a bit scared. Would there be a lot of gossip? What would happen if I flooded the network with updates?
So, luckily enough I’ve already built my own Lightning Network implementation which I use to collect raw data for https://www.robtex.com/lightning/ but so far I’ve only used it for passively collecting and updating a database.
Initial sync
Initial sync is performed when you connect to a node and [optionally] request a full sync of all channels. There is a new much more efficient format for this than the one I use but i guess the contents are the same, i’ve been too lazy to implement it in my BOLT implementation. No big deal really, since the network is still rather small so even the old uncompressed full sync is fine with my ADSL/ADHD combo.
I had my node connect to random nodes, and discovered that some implementations send all channels they have ever seen, even though the channels are closed long time ago.
Some implementations send the channels sorted by channel-id, some implementations send the channels sorted by last update time. It’s rather easy to do fingerprinting on them. I’ll do that some other time.
Normal operation
After initial sync has been done, you will get updates only when something actually changes. That is for instance if a channel goes down or up, permanently or temporarily. The nodes also broadcast updates without any change other than the timestamp to ensure the network knows about them. Some implementations seem to do this every 24h and sending a burst of all channels they have.
This shows a burst of updates. The channel format I used is the most common format used in LN, signifying the block number, index in block, and number of output. I added a final number to show if it was the left or the right side of the channel that was updated, since they are updated and signed individually by each node. In this case they are all 0 meaning it is the node with the lowest ID that sends them, otherwise it would have been 1
GOT update btc:531568:2154:1:0 DELTA 86400 AGE 7
GOT update btc:530373:1163:0:0 DELTA 86400 AGE 7
GOT update btc:530981:2243:0:0 DELTA 86400 AGE 7
GOT update btc:531040:1169:0:0 DELTA 86400 AGE 7
GOT update btc:531994:904:1:0 DELTA 86400 AGE 7
GOT update btc:533805:986:0:0 DELTA 86400 AGE 7
GOT update btc:531010:1286:0:0 DELTA 86400 AGE 7
GOT update btc:530838:1880:1:0 DELTA 86400 AGE 7
GOT update btc:530838:1923:1:0 DELTA 86400 AGE 7
The delta value of 86400 means it was exactly 24 hours since this side of the channel was last broadcasted. The age 7 means it took 7 seconds for the messages to reach my node via the network. If the sending nodes clock is in sync. Most of them seem to be.
The other extract from the log shows a couple of channels apparently flapping
GOT update btc:527455:179:1:0 DELTA 7 AGE 29
GOT update btc:527455:179:1:0 DELTA 22 AGE 29
GOT update btc:535105:2130:0:0 DELTA 22 AGE 10
GOT update btc:527455:179:1:0 DELTA 8 AGE 27
GOT update btc:527455:179:1:0 DELTA 12 AGE 39
Here we can clearly see that some [better] dampening is needed, that channel 527455:179:1 isn’t feeling very well, and even if the originating node floods the network it should be throttled by the relaying nodes. And to an extent it does. I do not worry much there. There are not many bytes per second on the network despite a few flapping channels.
So, I wanted more action. I tried to deliberately send updates of a channel every second to multiple nodes in the network just to see what happened. I did that by fabricating channel updates for a channel between node B and C while my own node was A pretending to just forward the packages and then waiting for the echo come back another way.
I hoped i could start a tsunami of updates going back to my probing node, but i totally failed. I only received a subset of the updates and it took on average 20 seconds between recieved updates. The network handled it just fine. This was a very interesting experiment with a very boring, but positive, outcome.
100% support of disable bit
Now some even greater news!
The anatomy of an update according to the current BOLT specification is unfortunately currently quite skinny and limited to the following:
[64:signature]
[32:chain_hash]
[8:short_channel_id]
[4:timestamp]
[2:flags]
[2:cltv_expiry_delta]
[8:htlc_minimum_msat]
[4:fee_base_msat]
[4:fee_proportional_millionths]
However most of that info is not interesting for this story anyhow. We will focus on the flags field which looks like this:
Bit Position Name Meaning
0 direction Direction this update refers to.
1 disable Disable the channel.
Bit 0 is only to signify which side actually makes this update. The message is signed by the side sending it, so it is more like a qualifier of the message than a flag for the channel.
The only actual flag that is defined so far is the "disabled" bit. This is used by a node to tell the network that a channel is down, most likely because the other node is offline. That is very useful knowledge when trying to find working paths. Now LND, probably the most common implementation, joins the other two implementations and starts supporting signaling it in latest master.
If my calculations and estimations of dead nodes are correct, it will mean probably half of the channels on the network will be signaled as disabled. That is excellent news, since it means now they can be avoided prior to routing. I’ve already upgraded my own node, and i have slightly more than 50% disabled channels, despite the fact that I’ve already detected and removed all zombies older than three weeks.
Consider if 50% of channels are not usable, and you find a path containing 2 channels, it would mean 50%*50% = 25% chance of success. If there are 3 channels, it would mean 12.5% chance, and 4 channels would mean 6.25% chance. Now with this disable feature soon enabled on all nodes in the network, the sender will no longer have to try those dead ends in vain. This will make a huge impact on path finding and on speed of payment.
We still need to gamble and trial-and-error on balances in the channels when making payments. I hope we can add better support for that in the near future too. But the support for the disable bit in all implementations is the best step forward in a long time! A small step for LND, a giant leap for Lightning Network. And we are just getting started!