Importing Panama Papers Data to Neo4j in Ubuntu

RH
2 min readMay 3, 2017

--

1. Panama Papers

Panama Papers data is a network data describing the offshore tax structures of one of the biggest tax heaven in the world. It is provided by the International Consortium of Investigative Journalists (ICIJ). In the following couple of posts, I would sum up my experience analyzing the data using Neo4j and Python.

2. Neo4j

Before we begin we have to set up our graph database first. Neo4j is a graph database developed by Neo Technology. To install it, follow the instruction described here. Upon successful installation, you have to modify several settings before you start.

First of all, in “/etc/neo4j/neo4j.conf”, comment out the following line as below.

#dbms.directories.import=/var/lib/neo4j/import

Secondly, in “/etc/security/limits.conf” , add the following two lines. This is just to modify the open file limits as suggested in the official documents.

root   soft    nofile  40000
root hard nofile 40000

And finally, in “/etc/pam.d/common-session” and “/etc/pam.d/common-session-noninteractive” add the following line.

session required pam_limits.so

2. Download data

The Shell script in the following repository (i.e. download.sh), would download the data and create a symbolic link so that neo4j could load it easily. In order to run it, git clone the followingrepository and in the home directory run the script as “bash download.sh”.

However if you want to perform this step manually, download the Mac version “panama-papers-mac-2017–04–18.tar.gz” from here. Untar the files and place the directory panama.graphdb (located in “/panama-papers/ICIJ Panama Papers/panama_data_for_neo4j/databases”) to “/var/lib/neo4j/data/databases” or create a symbolic link by the following shell script command. Basically that’s the only thing the Shell script does.

sudo ln -s /path/to/panama.graphdb /var/lib/neo4j/data/databases

3. Check whether it really works

Before you start neo4j we have to modify the configure file again so that neo4j could load panama.graphdb upon start. This could be done with the following Shell script command.

cd /etc/neo4j
sudo sed -i ‘/bms.active_database=/c\dbms.active_database=panama.graphdb’ neo4j.conf
sudo neo4j restart

After that, open http://localhost:7474 with your favorite web browser and type in your password. If you have not set your password the initial configuration would be shown on the page. If everything went well this is what you would see.

And by typing

MATCH (n) RETURN distinct labels(n)

we could confirm that the Panama papers data are now queryable in our system!

In the next post we would analyze this data using Python’s py2neo and neo4jrestclient.

--

--