Human Genes Graph

Hongping Liang
Dec 11, 2019 · 3 min read

We are going to visualize the over 40,000 human genes in the Neo4j graph database, and browse them by chromosome or chromosome regions.

If you are not familiar with the Neo4j graph database, my previous aricle: “Visualize Clinical Data in Graph Database in 20 Minutes”, have detail steps to install and run query

Original Gene file

The human gene data is tab delimited text file from www.genenames.org. Here is the samples:

Processed File

In order the map the relationships between the gene and chromosome, and chromosome regions. We parsed out the chromosome column into three columns for chromosome regions.

genes.csv

From the genes.csv, we extracted the unique mapping: chromosome_arm.csv, arm_region.csv

Import into the Graph Database

Copy data files into the Neo4j import directory.

genes.csv
chromosome_arm.csv
arm_region.csv

Run the following scripts to load all the data

USING PERIODIC COMMIT 5000
LOAD CSV from "file:///genes.csv" AS line
MERGE (gene:Gene {approved_symbol: line[0], approved_name: line[1], status: line[2], previous_symbol: line[3], previous_name: line[4], synonymes: line[5], chromosome: line[6], accession_number: line[7], omim_id: line[8], refseq_id: line[9], ensimble_id: line[10], uniprot_id: line[11], hgnc_id: line[12], chromosome_num: line[13], chromosome_arm: line[14], chromosome_region: line[15]});
LOAD CSV from "file:///chromosome_arm.csv" AS line
MERGE (chromosome:Chromosome {chromosome_num: line[0]});
LOAD CSV from "file:///chromosome_arm.csv" AS line
MERGE (chromosomearm:ChromosomeArm {chromosome_arm: line[1]});
LOAD CSV from "file:///arm_region.csv" AS line
MERGE (chromosomeregion:ChromosomeRegion {chromosome_region: line[1]});
LOAD CSV from "file:///chromosome_arm.csv" AS line
MATCH (chromosome:Chromosome {chromosome_num: line[0]}),(chromosomearm:ChromosomeArm {chromosome_arm: line[1]})
CREATE (chromosome)-[:CHROMOSOME_ARM {cnt:line[2]}]->(chromosomearm);
LOAD CSV from "file:///arm_region.csv" AS line
MATCH (chromosomearm:ChromosomeArm {chromosome_arm: line[0]}),(chromosomeregion:ChromosomeRegion {chromosome_region: line[1]})
CREATE (chromosomearm)-[:ARM_REGION {cnt:line[2]}]->(chromosomeregion);
USING PERIODIC COMMIT 5000
LOAD CSV from "file:///genes.csv" AS line
MATCH (chromosomeregion:ChromosomeRegion {chromosome_region: line[15]}),(gene:Gene {approved_symbol: line[0]})
CREATE (chromosomeregion)-[:REGION_GENE {apponame:line[1]}]->(gene);

To visualize the data, run the following query

MATCH (n) RETURN (n)

All the data and query script files are in github: https://github.com/hongpingliang/gene_graph

Hongping Liang

Written by

AWS Certified Solutions Architect, Hortonworks Certified Developer, Bioinformatics, The Jackson Laboratory

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade