Visualizing Breast Cancer Data with Neo4j and GraphXR

Andrew Kamal
Neo4j Developer Blog
3 min readDec 18, 2019

A while ago, I have entered the Global GraphHack challenge on Devpost. The main project I was trying to demo, was a bioinformatics use-case with Neo4j.

My use-case was to take existing open source bioinformatics data and visualize it in a meaningful manner. Luckily, I found a sample repository on GitHub with CSV data I can export.

The inspiration for this project, also came from a previous idea I have done before, shown here. The project I was inspired by, was a RapidMiner model I was working on to mine and visualize cancer genomic case studies.

This project later merged into the project Cancer@Home which allows other users to effectively visualize and mine cancer genomic case studies and garnish more and more effective data. I wanted to build a something quick and simple for the Global GraphHack so thought, why not start off with just a small data view model?

Utilizing GraphXR, I was able to load the sample CSV and create property tags for different IDs.

Property tags as in tags for the numerical rows in the CSVs such as 1.011 0/1 for example for data views. These tags are important because they represent measurements an oncologist or computational biologist may want to query or categorize.

Next, I set different colors for organizational (categorical) purposes and helping the final researcher visualize the data more efficiently.

This means instead of having to look at a CSV file with 100s and 100s of rows, they can view the visualization in a more meaningful way by looking at the numerical attributes by color in the GraphXR dashboard.

For reference one can export a CSV file using the CSV load command seen here or manually load to GraphXR using add graph and exporting the data file as seen here.

x

The picture seen above, is a view of different IDs and property tags (i.e. IDs related to the original data set and tags for numerical categories) and the picture seen below is an example of a menu being integrated on GraphXR for the visualization dashboard.

Development Instructions:

  1. Install Node4j Desktop Starter
  2. Install GraphXR app
  3. Run database via Node4j Browser
  4. Load CSV data locally
  5. Set Property Tags
  6. Run Force in Layout
  7. Allow for Parametric View & Tree Distribution
  8. Run Enhanced Table
  9. Relationship Variables: 1001, 17.99, 2019 (Year), 842302, M
  10. Can allow f(x) connectors for data sourcing
  11. Aggregate: (nodePropValue,neighborPropValues) => nodePropValue
  12. Label (Visible)
  13. EROAD (Visible)/Set Color
  14. Load default properties from CSV: 0.006193, 0.006399, 0.01587, 0.03003, 0.04904, 0.05373, 0.07871, 0.1184, 0.1189, 0.1471, 0.1622, 0.2419, 0.2654, 0.2776, 0.3001, 0.4601, 0.6656, 0.7119, 0.9053, 1.095, 10.38, 1001, 122.8, 153.4, 17.33, 17.99, 184.6, 2019, 25.39, 8.589, 842302, M, Property 1, id
  15. Node Size Scale:1 (in settings)

After the development of this program, I got introduced to Node4j Desktop and found out about GraphXR. I think that the ability to use simplified data visualization tools such as these, is where the future is heading.

It is quite inspirational to be able to take a simple tool you discovered, and start using it to represent data in a meaningful manner besides just a bunch of numbers.

I know data can be used in both ethical and non-ethical ways. I believe as data scientist, our goal should be utilizing data for ethical use-cases. I think the next step to further the development of my project, is creating inquiries for different aspects of the data and being able to dig even deeper beyond visualization.

As for my insight, I wish you all happy developing!

--

--

Andrew Kamal
Neo4j Developer Blog

The dude with many different talents *Coder *Inventor *Startup Advisor *Coptic Activist *Sponsored Athlete *Blogger *Conservative *Researcher *Miaphysite