Drawing graphs with Python in 2019

Ludvig Hult
7 min readJul 30, 2019

--

So you find yourself making regression on a vector autoregressive model (VAR). The question is: how does the fitted model behave? Maybe you can plot some graphs? Make prediction?

My idea is to visualize the model weights by directed graphs, so how do you do that in Python?

In this article I will go into some libraries for that purpose. Comparison and figures will be provided.

Motivation

The VAR model

yᵢ₊₁ = ∑ₖAₖyᵢ₋ₖ

yᵢ ∈ ℝ for i=1..tₘₐₓ

You measure some vector variable y in different points in time i. You have reason to belive that the value of y at the next time step is a linear function of its values at previous time steps. Therefore you introduce matrices Aₖ that holds these coefficients.

The parameters of the model are thus those A-matrices. I use a non-math’y way to name those parameters. For example, if component i in the y-vector corresponds to measurement of protein MCP-2 (CCL8), and component j holds data for protein CXCL10 (IP10), the (i,j)-component of the matrix Aₖ says how much the measurement CXCL10 (IP10) k time steps ago affect the value of MCP-2 (CCL8) in the next measurment. I call MCP-2 (CCL8) the To explain since that is the value we try to predict/explain. Similarly CXCL10 (IP10) is the Explainer. The value of k is called the Lag.

Model fitting and understanding the result with heatmaps

I have used the lasso method with cross validation for fitting the model, and that makes sure that most entries in the A matrices are 0. To understand the fitted model behvoir I then pose questions like:

Which components of the y-vector is predictive for future values of y?

How many log time does it take before the “memory” in the system forgets previous values of y?

These questions are answered by the same result: What are the non-zero coefficients, and what are the non-zero lags?

The first way to visualize it is to simply list the coefficients. Below you can find one such result for running VAR on some protein data time series. The values themselves are not important in the image, since they come from an unvalidated regression, but the format of the output is what matters.

Example result from a fitted VAR model. The i-j entry of the k-lagged matix, (Aₖ)ᵢⱼ is, for example -0.13686 where i=MCP-2 (CCL8) and j=CXCL10 (IP10), and k=3. The actual numbers in this image are not realistic.

This list, on the other hand, is poor for displaying patterns. The next step in improvement is making heat maps. Take a look below!

These are the three matrices Aₖ for k=1,2,3. In this dummy data, BCAN seems to have a one-lag impact on many different proteins (seen as first column in the AR(1) heatmap), and CXCL10 has impact on many proteins, but at lag 3 (column in AR(3)-plot).

The situation where heatmaps are lacking, is to show clusters and feedbacks. To visualize that, we need some graph rendering library.

Graph drawing in Python

In python you have somewhat limited graph handling libraries. The most well known are:

  • Networkx Unfortunately, this library is mostly focused on high performance handling of large graphs, and algorithms for handling the graph logically. The drawing options are limited..
  • igraph. A C library with APIs for Python, R and Matlab. Also focused on performance. I have seen some nice plots drawn with igraph, but the maintainers only develop the library for unix/linux/mac, so it is a little cumbersome to install if you’re on windows.
  • pydot A interface for creating DOT files, which Graphviz can render. There is also a self-claimed successor package pydotplus. Both seems to be in low maintanence.
  • Graphviz. a stand alone tool for rendering graphs. Can handle the DOT format. Several different python libraries use GraphViz as a rendering backend.
  • matplotlib. A standalone tool for rendering plots. Mature library. Not very user friendly (depending on what you are using it for).

Lets take a look in some detail!

In all cases, there are quite some options for how to config the pictures, and I will not provide exact code for my different images. Stack overflow can probably help you out. :)

Networkx

networkx is quite simple to install (use pip!) and the API is nice. The documentation is quite good. The drawing facilities are a bit so-so. As is stated in the documentation:

NetworkX provides basic functionality for visualizing graphs, but its main goal is to enable graph analysis rather than perform graph visualization.

On the plus side, you do have quite a bit of control in rendering. you can render edges, nodes and labels separately. You have a positioning-phase were you can select your layout engine of choice and so on. And as a bonus: since most plotting I do is in matplotlib, I get the value of a known library.

For the normal rendering, using matplotlib, I cannot get the two-way-arrows to show properly, nor can I get the reflexive graph edges to show (e.g. CXCL10 influencing CXCL10). This is not acceptable, since that means I cannot illustrate the diagonal of the Aₖ matrices. A previous post on SO says one needs to turn to the Graphviz backend to render such graphs.

A graph rendered by networkx using matplotlib. Notice the difficulty of showing arrows where they go both ways (as in the IL-1ra to MIP-1b interaction

Below, you can see how it looks using the Graphviz backend. It needs that you have pydot installed, since it converts the networkx graph into a pydot graph object, and then renders with Graphviz.

Terrible, unstyled graph from pydot through networkx

Right off the bat, I can tell you that rendering such plots is WAY more tricky. The first attempt shown below. Actually. I then gave up. And used pydot directly instead of using networkx as a intermediate. That gave me quite a bit more control at least.

Pydot (and Graphviz)

So the first thing to know is that pydot really just is a programmable API to the structure of a DOT document. So all properties of objects and so on are only the ones permissable in DOT. To create a good DOT file with pydot, I needed to learn some more on DOT. These resources below helped a lot!

Next, pydot is used to call one of the programs in graphviz. Graphviz contains many different rendering programs. dot is the standard one, but since that is really designed for hierarcic acyclic data it is not that useful to me. The user guide gives an exposition, so read it! Otherwise Graphviz is mostly unintelligable. There are alternatve layout modules, and it seems that neato with some tuned edge weights etc gives a result that is usable for me. Not beautiful, but usable. And that is funny, considering it is optimized for undirected graphs.

My largest issue here is that I have to little control over layout. It gets quite terrible as you can see. Options are passed both as properties on nodes and edges, but one can also pass command line options to graphviz.

Still, Graphviz does deal with all the things that networkx could not, but I cannot force the layout so that positions of nodes is the same across all AR matrices. From the documentation I understand it is possible, but I cannot seem to understand how.

A graph rendered by networkx using matplotlib. Notice that all information that must be in the graph are in the graph. It is quite terrible still. Notice that the interactions between IL-1ra and MIP-1b have so small coefficients that the arrows are almost white and very thin.

I also want to note that Graphviz is the backend for other libraries I use. Several libraries in Structural causal modelling use Graphviz (actually, this library even uses networkx+pydot+graphviz), and so does the AST output from sympy.

igraph

Since this adventure took longer than expected, I will have to look into igraph some other day. Still, my quick check on the net says that igraph is more or less “networkx for R”, with ports to Matlab and Python. The plot below is generated in igraph, and part of the documentation.

My general impression is that igraph can make plots with a quality similar to the previous libraries. I have checked though, and it is competent to draw reflexive graphs.

The biggest problem I can see with igraph is that it is a little too complicated to install on windows. The code snippets I’ve seen, and the output looks all okay.

Taken from https://igraph.org/python/doc/tutorial/tutorial.html

Conclusions

If you want to draw complicated graphs in python: look at GraphViz and the pydot/pydotplus libraries. It is quite complicated and finicky. But it works.

If you are using some specific library for other reasons (such as plotting in matplotlib, or Graphviz for some other library) you might want to use a specific library.

I can also imagine that anyone coming from the R community might want to use igraph.

So how was this? Informative? Readable? Give me a notice!

--

--

Ludvig Hult

PhD student in Machine Learning and Causal Inference. Former IT consultant experienced in BI and health economics.