Visualize Clinical Data in Graph Database in 20 Minutes

Hongping Liang
Nov 22, 2019 · 5 min read

We are using Neo4j to store and visualize some public clinical data from Personal Genome Project

In 20 minutes, you will see graph like this:

I am using Window 10 machine, and install “Neo4j Desktop” locally.

Download Neo4j Desktop.

Download installer for “Neo4j Desktop” from Neo4j web site

The installer “neo4j-desktop-offline-1.2.3-setup.exe” is small and download should finish quickly.

Install Neo4j

Follow the installer and finish up the installation as default

Start Neo4j

The first time the Neo4j starts, it will ask for the directory to store the application, ie database location. Take note of this directory, and you will need to put the files to be imported in a directory under it. To make thing easier, I created Neo4jData directory right under C drive so it is like

Then follow the screen for activation and finish installation

Create Database

Here is the UI for the Neo4j Desktop. To create a database, click on the “Add Graph” plus sign icon on the right bottom

Click “Create a Local Graph”

I name the database “PersonalGenomeData”, and fill out the password, click on “Create” button

Start Database

To start the database, click on “Start” button

Once the database started, click on the “Manage” button

Then, click on the “Open Browser” button

This is graph browser, and you will run query here

Now we are ready to load the data

Clinical Data

We will use some public clinical data from “Personal Genome Project” by Harvard University.

You can download the data from here

Download the following four files:

allergies.tsv
conditions.tsv
demographics.tsv
medications.tsv

Import the data into the database

Copy all four files into the import directory

The import directory in my case is:

C:\Neo4jData\neo4jDatabases\database-54e3a4f1–910f-46fe-b3ab-8ce256e9008a\installation-3.5.12\import\ 

If you install it in different directory, adjust it accordingly, also The database id could be different

Import Personal Information

The personal information is in “demographics.tsv” file.

Copy and paste the following command into the query field and hit the “Run” button on the right

LOAD CSV from "file:///demographics.tsv" AS line FIELDTERMINATOR '\t'
CREATE (person:Person {id: line[0], dob: line[1], gender: line[2], weight: line[3], height: line[4], blood_type: line[5], race: line[6]})

After execution, the output window will look like

Click on the “Database” icon on the left, you will see more information about the data we just loaded

Click on the “Person” to see sample data we just uploaded, mouse over the persons, you will see the properties of the person at the bottom

To change the color, click on “Person(25)”, a color pallet will appear at the bottom, click on the color to choose it. You can adjust the size of the circle too.

To change the label, click on the right button triangle, it will open up a property list. Click on the second Id to display the person id

Load allergy information: “allergies.tsv”

LOAD CSV from "file:///allergies.tsv" AS line FIELDTERMINATOR '\t' MERGE (allergy:Allergy {name: line[1]})

Load medications “medications.tsv”

LOAD CSV from "file:///medications.tsv" AS line FIELDTERMINATOR "\t"
MERGE (medication:Medication {name: line[1]})

Load medical conditions “conditions.tsv”

LOAD CSV from "file:///conditions.tsv" AS line FIELDTERMINATOR '\t'
MERGE (condition:Condition {name: line[1]})

Load relationship between person and allergies

LOAD CSV FROM "file:///allergies.tsv" AS line FIELDTERMINATOR '\t'
MATCH (person:Person {id: line[0]}),(allergy:Allergy {name: line[1]})
CREATE (person)-[:ALLERGIC_TO {severity:line[2], start_date:line[3], end_date:line[4]}]->(allergy)
CREATE (allergy)-[:AFFECT {severity:line[2], start_date:line[3], end_date:line[4]}]->(person)

Load relationship between person and medications

LOAD CSV FROM "file:///medications.tsv" AS line FIELDTERMINATOR '\t'
MATCH (person:Person {id: line[0]}),(medication:Medication {name: line[1]})
CREATE (person)-[:TAKE {severity:line[2], start_date:line[3], end_date:line[4]}]->(medication)
CREATE (medication)-[:TAKEN_BY {severity:line[2], start_date:line[3], end_date:line[4]}]->(person)

Load relationship between person and conditions

LOAD CSV FROM "file:///conditions.tsv" AS line FIELDTERMINATOR '\t'
MATCH (person:Person {id: line[0]}),(condition:Condition {name: line[1]})
CREATE (person)-[:COMPLAIN {start_date:line[2], end_date:line[3]}]->(condition)
CREATE (condition)-[:COMPLAIN_BY {start_date:line[2], end_date:line[3]}]->(person)

Run Query

Run a simple query to see data

MATCH (n) RETURN (n)

Click on the nodes, all the data should be loaded automatically. Change the color, size, and label according

You can maximize the screen

Note: there are ways to merge all the query together

All the data and query script files are in github: https://github.com/hongpingliang/clinical_data_graph

next article: -> Human Genes Graph

Thanks for reading, please leave your comment below

Hongping Liang

Written by

AWS Certified Solutions Architect, Hortonworks Certified Developer, Bioinformatics, The Jackson Laboratory

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade