Lightning Network: Some Graph Theory Metrics — Part 2 (Practical Guide)

Stelios Rammos
Analytics Vidhya
Published in
9 min readDec 17, 2019

Introduction

This article is part of a series of articles aimed to analyse the Lightning Network topology using graph theory concepts. In the first article, I went over a set of metrics which are helpful in monitoring the growth of the network and the health of its connections, such as the diameter/radius of the graph, its completeness, its transitivity and more. In this one, we will go over the code for computing those metrics yourself when running a Lightning node. If you aren’t familiar with the metrics mentioned above, you might want to have a look at the first part over here. Otherwise, let’s get straight into it!

After reading this article, you will be able to monitor the growth of the network yourself and gain some valuable insights on its connectivity. Some of these you might find very useful if you want to become a valuable routing node and contribute to the network’s good functioning.

Heads up: you will need your own lightning node running the LND implementation. If you are not already set up, check out the following guide to get your node up and running from scratch.

Computing the network metrics

Pre-requirements

  1. A lightning node running LND
  2. A working version of Python (I use Python 3.7.5)
  3. The following python libraries: Pandas, Numpy, graph-tool, NetworkX
  4. Optional: Jupyter Lab

Note: practically everything can be done either just with NetworkX or just with graph-tool, however I have found graph-tool to be much much faster for some calculations as it is written in C with a Python wrapper. On the other hand, NetworkX feels more python-like and you might find it easier to use.

Note 2: graph-tool can’t be installed using pip and you will have to install the dependencies by hand or using a package manager and then compile it yourself. It can be a tedious process and the compilation takes a long time so go make yourself a nice cup of coffee.

Note 3: for simplicity’s sake, we assume the graph is undirected. However, both these libraries can deal with directed graphs too. It just needs a little bit more processing when converting the JSON graph to another format. If you would like me to make another post going over that process let me know!

Creating our project environment

Let’s first create a project directory where we will save the graph data:

$ mkdir ln-graph-stats
$ cd ln-graph-stats

Then, we need to create a virtual environment for it (optional, recommended).

$ python3 -m venv ln-venv

Tip: If you are going to use graph-tool, I recommend you install it in your system packages instead of installing it in your virtual environment, that will save you a lot of hassle. In that case, run the following command instead (after having installed and compiled graph-tool):

$ python3 -m venv ln-venv --system-site-packages

Finally activate your virtual environment and we are good to go!

$ source ln-venv/bin/activate

Getting the graph data

To get the graph data we first need to ensure we have an instance of LND up and running. Once it’s done, open a terminal tab and run the following command:

$ lncli describegraph > /full/path/to/ln-graph-stats/lngraph.json

Reading the graph data in python

The graph data will be in json format. This is great for storing the data, but it is not great for processing it. Therefore we will want to parse it into a format that we can work with. I will show you how you can parse the json into: a Pandas DataFrame, a NetworkX graph and a Graph-Tool graph. Each presents its own advantages balancing speed and conveniency and I will let you decide which you prefer to work with. I will always specify which I am using for each statistic.

Panda DataFrame

NetworkX Graph

Graph-Tool Graph

And now, the metrics!

Okay, great! We’ve got the option to save our graph in either of three different formats. Each format has its own advantages and which one you use highly depends on the type of analysis you wish to do. For the sake of this practical guide I will showcase the use of all three of them for different metrics calculations. However, as I mentioned above, you are free to pick between Graph-Tool and NetworkX depending on whether you prefer speed or ease of use. Keep in mind Graph-Tool might not have all the functions NetworkX offers either.

Average and quantiles

If you check out the BitcoinVisuals website, you will notice most of the statistics show the average, as well as the following quantile values: 0.9, 0.5 and 0.1. The average is sometimes misleading and the quantile values can give a better idea of the actual distribution of values across the network. Since we will be reproducing the average/quantile for most statistics, you can make use of this useful little function to save some time and lines of code.

Nodes (Panda DataFrame)

I decided to group some of the metrics together in a way that made most sense practically and code-wise. So, in this bit we will have a look at all the metrics related to the network nodes that don’t require any fancy graph library. Just some good old panda data frames. We will compute the total number of nodes both with and without channels, the number of channels per node and the total capacity per node. Also, since we will be looking at the channel policies anyways, we will compute the total number and percentage of enabled channels per node.

The following function will add the following columns to our graph data frame: ‘num_enabled_channels’, ‘num_channels’, ‘percent_enabled_chan’ and
‘total_node_capacity’, which is all we need to compute the statistics mentioned above.

After calling the above function and assigning the return value to your graph_nodes variable, you can call graph_nodes.head() to ensure that the columns were added properly. The output should look something like this:

If the columns got appended properly and the values look correct, we can now just use built-in pandas functions to retrieve the information we want.

Notice, in the code snippet above, I make a distinction between active and inactive nodes (channels without any enabled channels). The network will mark a channel as disabled if it is not a viable option for routing (eg: lacks sufficient capacity) so that the network tries to avoid passing through it. Counting only the enabled channels lets us see the ‘usable’ capacity of the node and network. The output of the code should look as follows (values will probably be different):

Channels (Panda DataFrame)

Next, let’s see how to retrieve the number of channels on the network, alongside the number of duplicate channels. This bit can also be done with either of the two graph libraries but since we it’s very straightforward and still relatively quick to get those values from the DataFrame.

The output should look something like this:

Network Capacity (Panda Dataframe)

To get the total network capacity is really straightforward as well. Once you have your df_channels DataFrame, simply run this command df_channels.capacity.sum() .

Capacity Per Channel (Panda DataFrame)

For the capacity per channel, we will use our get_basic_stats function again, as follows:

values = df_channels.capacity.values
average, percentiles = get_basic_stats(values, 'capacity per channel')

Distance Measures (Graph-Tool Graph)

Before we can measure the distances in the graph we must ensure the graph is connected, otherwise we will get the wrong distance measures.

Note: I found NetworkX to be very slow for computing the diameter and radius of the graph and avoided using it from there onwards.

If the graph is not connect we should retrieve the different graph’s connected components and analyse them separately. As we can see below the graph is mainly connected. However, we also observe several smaller connected components formed by 2 to 5 nodes which are most likely nodes just looking to transact between one another and are not interested in routing payments, the channels between them are known as private channels.

If you wish to perform network analysis, it would make more sense to keep only the largest connected component (lines 4 and 5 of the below code snippet), as we can assume the private channels do not want to participate in the routing process anyways. We should also remove the duplicate edges which might skew some of the other measures (line 9).

Now, we can use the below function to retrieve the average distance, the diameter and the radius of the graph.

If you set the pseudo_diameter parameter to True, the function will use the fast diameter approximation algorithm used by Graph-Tool. You can read more about it in the documentation here. The return_dist parameter will return a DataFrame with all distances for all vertices.

Completeness Measures (Panda DataFrame)

Completeness is simple to calculate given that we have already counted the total number of channels (in the Nodes section above). We just need to calculate the ratio between those channels and the total number of possible channels in the graph, which is n*(n-1)/2, where n is the number of vertices. In code that would look like:

# Completeness measure: density of the graph
max_num_channels = (total_cnt_nodes*(total_cnt_nodes-1))/2
completeness = num_unique_channels/max_num_channels

Clustering Measures (Graph-Tool Graph)

The clustering measures consist of the graph transitivity and the individual node transitivities. Graph-Tool has a function to compute both, called global_clustering() for the graph transitivity and local_clustering() for the vertex transitivities.

# Transitivity is the ratio of potential triangles present.
transitivity, sd = gt.clustering.global_clustering(gtgraph)
print('Graph transitivity: {}\n'.format(round(transitivity,3)))
# Clustering coefficient is the ratio of interconnections between a node's peers.
transitivities = gt.clustering.local_clustering(gtgraph).a
get_basic_stats(transitivities, 'node transitivities')

Connectivity Measures (Graph-Tool and NetworkX Graph)

For the connectivity measures, I had to use both Graph-Tool and NetworkX given that Graph-Tool doesn’t have a function to compute cut edges (to the best of my knowledge). If you’ve been using Graph-Tool all along, you can use the function in the NetworkX Graph section.

Summary

In summary, this article goes over the code needed to compute useful metrics to analyse the topology of the Lightning Network. We saw how to convert the JSON LN graph into a format with which we can work more easily to compute all the metrics. We used three different tools: Panda DataFrames, NetworkX and Graph-Tool graphs, depending on ease of use and performance. You can also find all the code in this article as a Jupyter Notebook on my Github, over here.

If you enjoyed this article or if there is anything you would like to discuss about its content, feel free to leave a comment or send me a message through one of the channels listed below.

As I said in the beginning, this article is part of a series of articles around the Lightning Network and its graph. The goal is to help people get more familiar with the graph topology and give some point of reference for channel management. Here are some ideas I have for the next article:

  1. Compute more metrics used in graph theory to analyse the network topology
  2. Automate the metric calculations to create a dashboard-like experience

I would like to hear what you are most interested in and I’m also open for collaborations on any of these as I am myself still learning about this technology.

Let’s socialise!

Don’t hesitate to contact me over any of the following platforms:

Twitter: @Stelios_Rms

Email: stelio.rammos@gmail.com

LinkedIn: https://www.linkedin.com/in/stelios-rammos-675382149/

Blog: http://www.blog.steliosrammos.com

--

--