Authoritarian by Proxy: using Neo4j and Python to calculate second order democracy indices from trade data

Kenneth Nielsen
Analytics Vidhya

--

The following is the first part of a little hobby project, I have been having fun with during my son’s naptime and in the evenings while I was/am on paternity leave. I started it for me to become better acquainted with Neo4j and the Cypher query language syntax, but I have chosen to do a sort of back-and-forth with Python to demonstrate how easy and powerful it is to use these together. Seeing as there are probably others out there with similar interests, I thought it worthwhile to try and present it in a relatively succinct form.

This will not be a “basic introduction to Neo4j” or the Cypher query language with a buildup from the bottom. We will, however, see examples of how to build a graph in Neo4j from Python, how to run Cypher queries from Python (and Neo4j) with comments on the structure of the queries and how to update existing nodes with calculated values from the graph.
In addition to learning something about Neo4j/Cypher, there will be some interesting findings along the way (at least they were to me), for example that authoritarian regimes are a more heterogenous bunch when it comes to the democraticness of their counterparties than are full democracies.
The full notebook (wip) which contains a lot more code than I will show below is available here.
At the end I explain how the ideas presented here are transferable to ESG scores and credit risk evaluations of companies.

Okay, enough about me and my motives, what is it we will be looking at actually? What is this about a second order democracy score, and why should I (meaning you) care?
Well, in case you haven’t heard about it before, the democracy score/index is an index compiled by the Economist Intelligence Unit which attempts to measure how democratic a country is based on five categories (with 60 underlying factors). The five categories are: Electoral process and pluralism, Functioning of government, Political participation, Political culture and Civil liberties. Based on the underlying factors each category is assigned a value between 0 and 10, the average is taken and BAM you have your democracy score.

One thing is the democracy score of a country in and of itself, but what about its role in the world?

Now it seems to me that, a country can be ever so democratic in its own right, but if it through its trading helps sustain draconian regimes or is heavily reliant on the same for exports, then isn’t that worth considering?
This got me to thinking “why don’t we construct a second order democracy index to reflect this?” The score could, naturally, tell us something about which countries should/could exercise influence on which regimes. But it could also be used to identify countries that are more exposed to potentially unstable rules of government. (By adding information on which goods are exchanged we could analyze which are most vulnerable to, e.g., oil becoming a less important commodity.)
To come up with this score, I have dived into the treasure trove of data that is available at the Observatory of Economic Complexity (oec) to source the export data for a long list of countries. This will be combined with the latest democracy scores from 2019, which I have found on wiki to finally calculate the second order democracy index. (Honestly, if you haven’t looked at oec before, I encourage you to go have a look, as there is a lot of cool data and visualizations.)

[…] looking at the trade patterns this way makes it more transparent to the rest of the (democratic) world how pressure could be applied and by which states

Based on the score we will look at

  • if there is a difference in trading behavior between different regime types,
  • if some authoritarian regimes are sufficiently dependent on a group of democracies that there could be a chance for these democracies to push the regime in a more democratic way, and
  • which democracies have the most non-democratic counterparties (weighted by trade)

Next time, we will use some of the functions available in Neo4j to determine centrality and communities and perhaps add some more data to the mix.

The Democracy Data

We start out by retrieving the information about the democracy score for a range of countries. I actually spent quite a while trying to do this with Beautiful Soup, but eventually gave up and opted for pandas’ builtin read_html function. So, if anyone out there has a nice code snippet to do this with Beautilful Soup for wiki tables that also contain < li > elements then let me know, as I have other ideas for similar projects where this will be needed!

democracy_rank = r’https://en.wikipedia.org/wiki/Democracy_Index'
demo_df = pd.read_html(democracy_rank, header=0)[2][:-1]

After some basic scrubbing and manipulation (which I will spare the reader) to allow merging with the oec-data, we end up with a dataframe where the head and tail looks like the following

We see that Norway takes the price as most democratic country followed by Iceland, while — perhaps not too surprising — North Korea flunks out, with the Democratic Republic of the Congo as a close “second”.
Besides Full democracies and Authoritarian governments there are also Flawed democracies and Hybrid regimes.
We can do a groupby on the region column and see how the different regions compare in terms of democracy score, i.e. we do

demo_df.groupby('Region[n 1]')['Score'].describe().reset_index()

We see that North America (which consists of just Canada and the US) has the highest average democracy score, while the Middle East & North Africa has the lowest. A quick count of the different forms of ruling based on region gives us a hint as to why the average scores come out as they do

Having had a quick look at the democracy-score data let us now turn to the data on trade.

The Trade Data

The data that we will be looking at here can be found at oec, and there is an API available to source it. I just couldn’t make it work based on the python libraries I could find, and seeing as I really wanted to work with the interface between Python and Neo4j and not spend my time updating code from a github repo, I opted for the old fashioned and very time consuming, way of downloading, unpacking and renaming csv files. Not very 2020 (or even 2014), I know!
As my ideas about what to analyze changed, the list of countries for which I wanted to gather data grew to include all the countries in the democracy data set. So, in hindsight I would have wished that I had spent juuust a bit more time looking at that API. Anyways…
Some countries in the democracy data cannot be found on oec (and vice versa). These are Luxembourg, Botswana, Belgium, Lesotho, Namibia, and Swaziland (or Eswatini). It seems that Belgium and Luxembourg have been added into Belgium-Luxembourg in the oec-data, but seeing as I could not readily split the values between the two I opted to leave them out for now.
For each of the 161 remaining countries we have 2018 export values in USD relative to each of their trade partners — the export value of one country obviously being the import value of the receiving country.
If we think in graph terms, this means that between each of these 161 countries and all their counterparties (some of which we don’t have democracy scores for) we can add export relationships, or edges, and we can weight them by the value of the export to a specific counterparty. But I am getting ahead of myself now.
Let’s take a peek at the trade data

We see here the first five of 221 rows and the first eight columns of 164 in total in the trade data. 221 rows because there are 221 countries and territories in the trade data. 164 columns because we have downloadet full export information on 161 countries that we have democracy scores for, plus the three columns “Complexity Index”, “Country ID” and “Country”.
The “Export_Value_[country]” columns should be read as, e.g., Denmark exports ~2*10⁹ USD worth to Japan. So when we get to the row that has Denmark in the Country column, the values in Export_Value_Denmark will be zero — just as we see for Germany here. (Don’t worry about the Complexity Index and Country ID columns by the way...)
The trade data is joined (on Country) with the democracy data and we are ready to construct our graph!

Creating Nodes and Relationships

First and foremost, you need to have a graph setup and running in Neo4j. I use the free desktop version of Neo4j and for this particular example the server version I am running is 3.5.6. I will be using py2neo for the interaction with Neo4j from Python.

from py2neo import Node, Relationship, Graph, Path
# Database Credentials
uri = "bolt://localhost:7687"
userName = "neo4j"
password = "test"
# Connect to the neo4j database server
g = Graph("bolt://localhost:7687", auth=(userName, password))

Now we can start creating nodes using the Node method. The recipe is pretty basic, you do Node(<node_type>, <key>=<value>, …, <key>=<value>). For our dataset we will start by creating country nodes as follows (df_cnt is the 221-row dataframe with the joined data)

create_nodes = []
for i in range(df_cnt.shape[0]):
node = Node('Country'
, name=df_cnt['Country'][i]
, country_id = df_cnt['Country ID'][i]
, complexity_index = df_cnt['Complexity Index'][i]
, democracy_score = df_cnt['Score'][i]
, regime_type = df_cnt['Regime type'][i]
)
create_nodes.append(node)

If we head on over to Neo4j from here and look at the browser we will see nothing! That is because, currently the nodes only exist in notebook-land. We have to use the create method of the Graph object to write them like so

for i in range(df_cnt.shape[0]):
g.create(create_nodes[i])

Now, if we look in Neo4j we will see something like

In Neo4j, if you hover over one of the nodes you can see the other properties we defined when we created the them. But currently there are no connections, which was to be expected as we haven’t created any.
To have some different nodes in there we will also make some based on the region a given country belongs to. However, as mentioned, some countries that were in the trade data set are not in the democracy data from where we got the region information. For these we will simply assign the region=’Unknown’. The same goes for the regime type.
To create the region-nodes we simply run

create_regions = []
for i in range(len(df_reg)):
node = Node('Region'
, name=df_reg[i])
create_regions.append(node)
for i in range(len(df_reg)):
g.create(create_regions[i])

With nodes in the database we can query the database using the run method of the Graph object. Let’s say we want to find Angola and have all the information on Angola returned, then we would do

g.run("MATCH (a:Country {name:'Angola'}) RETURN a").data()

and receive

In the above query we match the very simple pattern of finding a country-node, which we assign to ‘a’, where the property-key ‘name’ has the value ‘Angola’ and then we return everything on that node.
Next, we will add some relationships to the graph — right now it just consists of country and region nodes.
The logic in the below code snippet is basically to run through each of the countries in the country list, and then build relationships to all other countries based on the exports to these countries.

When the dust has settled we will find that there are 22,846 export relations in our data. Let’s sample five export relations just to have a look at how such relations are constructed. Running

list(np.random.choice(exp_rel, 5))

returns

(rerunning it will of course return something different).
What we see is Cypher query paths of the form start_node-relationship->end_node. Nodes are indicated by () and relations by [] and we see that the relationship type is “exports_to” and that there are two relationship keys, “export_value” and “export_value_pct”. The latter is just how much the exports make up of the total exports of the exporting country.
Now it is a matter running create statements similar to what we did for the country nodes and the relations will be present in Neo4j as well. And we can do the same-ish for the regions, but I won’t bother you with that here.

Calculating the 2nd Order Democracy Score

With the nodes and relationships in place let’s first do (in the Neo4j browser)

MATCH (c: Country {regime_type: 'Full democracy'})-[:belongs_to]->(r:Region) RETURN c, r

which should return

So this looks as we would expect: we have a number of full democracies along with the region they belong to. The next step is to calculate the second order democracy score. (We will actually calculate two different ones — one based one exports and one on imports.) This is where having the data stored in a graph db starts to really pay off! The code is as simple as the following

where we have used that the exports of one country are the imports of another. Running res.head() returns

If you execute the code directly in Neo4j it takes 277 ms. If we wanted to do the same in Python, getting the export score would be a breeze (see code snippet below). But to get the import score would be a bit more work (at least based on how I have structured the data).
One way of doing it is to sort the country column in the same way that the Export_Value_[Country] columns are ordered and then do a dot product. In code form we could end up with something like

Which in total takes around 500 ms and a lot more brain activity than the query in Neo4j.
Cool, so now we have the second order scores, but why not update the nodes in our graph to contain that information, so we can include it in queries going forward? To do this we execute (and we also round off the second order scores at the same time)

The operative “word” being the SET clause. Having run that, the nodes now have two additional properties, namely EXP_SECOND_ORDER_DSCORE and IMP_SECOND_ORDER_DSCORE.

Deriving Insights — Democraticness of Trade Partners

With the scores in place, let us plot the import versus export second order scores to get an overview of how the countries and government types are disributed

we can see that there is a tendency of full democracies having both scores in the middle to upper range, while authoritarian regimes are a bit more all over the place. We will come back to this a little later when we analyze if there is a statistical significant difference in the second order scores.

Which full democracies have the lowest second order export democracy scores?

We can also ask do a sort on second order export (or import of course) score for full democracies to figure out which ones are most reliant on less democratic counterparties. If we return the five worst, we see

So we find that Australia comes out at the bottom with New Zealand also in bottom five. Chile, Uruguay and Mauritius I am not too surprised by, simply given their geographical location and the scores of neighboring countries. But Australia and New Zealand surpised me a bit. What you find when you dive into the data is that for both these countries, China is the biggest export market by far — for Australia it is ~3 times bigger than number two, and for New Zealand it ~1.5 times bigger than number two.

Are there democracies that import a major part of an authoritarian regime’s exports?

Next, we could go looking for if there are any full or flawed democracies that make up a large part of a given authoritarian country’s exports, since we would then hope that some pressure could be exerted on the regime by such counterparties. However, seeing as other authoritarian regimes may also be major export markets, it is perhaps better to simply get all countries that import x % from a regime, where we will arbitrarily set x to 10. Why include this information? It is conceivable that such a regime might just up its exports to countries with a similar rule of government…
The Cypher query we will execute is

MATCH (a: Country {regime_type: 'Authoritarian'})-[e:exports_to]->(d:Country) WHERE e.export_value_pct>=10 RETURN a.name AS AUTHORITARIAN, e.export_value_pct, d.name as COUNTERPART, d.democracy_score as SCORE, d.regime_type as REGIME_TYPE ORDER BY AUTHORITARIAN, SCORE DESC

One “problem” with this is that, we would ideally like to only work with instances where at least one of the counterparties taking ≥ 10 % of exports off the hands of the authoritarian regime is democratic. But here we also get the cases where this is only true for authoritarian and/or hybrid regimes. To remedy this, after having loaded the data into a dataframe called df_t, I have simply removed the cases we don’t want in Python like so

Where in the final line of code I further limit df_t to contain the cases where flawed and full democracies (that import ≥ 10 % from the authoritarian... yadayadayada) together make up a bigger export market than do the authoritarian trading partners. OK! A lot of talking, let’s throw a “reset_index()” in there and the outcome is

There seem to be some candidates on the list here, but before we get too hopeful that some of these authoritarian countries can be swayed towards more democratic ways we would probably have to consider exactly what it is they are exporting… Aaaand in several cases it is either oil, metals or diamonds. Commodities that the democracies who import them don’t want to do without, meaning that they will probably not be too forward in their endeavours to change things in the authoritarian countries.
Also trading is in many cases a two-way street with some degree of co-dependence. (More on that next time, where we will count in/out going relations and add this info to our nodes when working with similarity scores and other graph-algos.) So if the democratic country threatens to reduce its imports chances are that such a step will simply be reciprocated.
Having said all that, I still think that looking at the trade patterns this way makes it more transparent to the rest of the (democratic) world how pressure could be applied and by which states. It should then be up the rest of the democratic countries to support those countries that would be applying the pressure.

Is there a difference in trading behavior between full democracies and authoritarian regimes?

The final thing we will look at in this part, is if authoritarian regimes tend to trade more with less democratic counterparties than full democracies do. To answer this, we will run a two-sample Kolmogorov-Smirnov test in Python between the second order scores for the countries in the two groups, but let’s first look at the summary stats for the two groups.

Eyeballing it, there seems to be a difference, and in fact, the test shows that at a 5 % significance level the hypothesis of no difference is rejected! Meaning that democracies tend to trade more with more democratic counterparties. This can of course of be down to historical and geographical reasons. So an interesting addon to this analysis could be something like what the stats look like for countries outside the same region, which we leave as an exercise to the reader.

Additional Data to Consider

One could easily extend on the work I have presented above, e.g., by adding literacy score, happiness indices, which country used to be a colony of which, and as mentioned in the intro (which I assume you have forgotten by now) information on which products are (mainly) traded between each country and so on. Especially the latter two would interesting from the perspective of understanding if there is still what you might call a proxy-colonial rule in place, or if former colonies tend to be more or less democratic than their former colonial rulers and not least how a collapse in oil would ripple through the trade system.

Related Use Cases

Environmental, Social and Governance Score
Unless you have been completely off the grid these past few years, you have probably noticed that companies more and more have to report a good Environmental, Social and Governance (ESG) score to be considered eligble for investing by many investors. (For those who have recently surfaced from an involuntary vacation in a North Korean prison, the ESG score basically says something about how much a company does to protect the environment, stays away from dodgy business practices and has a culture of equal opportunity.) But the ESG of a given company is one thing, what about the scores of its business relations? And their relations?

Credit Risk Valuation
If we were a bank we could have a highly similar interest in the companies we lend out money to. The interest rate that the companies pay on their loan is dependent on their credit risk/score, which in high level terms is a measure of likely it is that the company will be able to pay back the money owed. But just as above, a company may have a very good credit score based on how well it is doing now and in the past. However, suppose we had a view to the trade connections between the companies. With this we could calculate an average weighted second order credit score. We could even go further and analyze (first and second order) exposure to geographies, stability of regimes, reliance of suppliers/buyers on oil being what the world runs on, weather, etc. A relevant question is of course, why would the companies be willing to share such details with their bank(s)? Well, the bank could provide this exact sort of analysis to the company, for the company to hedge itself better — the company will be more robust and potentially receive a disount on interest rates and the bank in turn has a less volatile customer on its books (and may even be able to turn a profit on the analysis). The next question would be, couldn’t the company just do this analysis itself? The answer is “no”, as it doesn’t have access to the same amount of information as the bank does. (I am not saying the bank should sell information on the trading of competitors, rather it would be condensed results and guidance similar to market analysis they already provide.)

The above two cases were in fact what I would have liked to do, but needless to say such data is not that easy to come by, so instead I started looking for something with a similar air to it. I hope you see the similarities and that you enjoyed it, maybe even learned something and please leave a comment (constructive feedback always welcome!) or clap.
Next time we will dive a bit more into conventional graph algorithms.

--

--