A Network of Disorders and Disease Genes Data Visualization

Victoria (Hung-Ju) Chen
4 min readApr 18, 2019

--

Introduction

Nowadays, we live in an era of advanced science and technology, but at the same time, more and more complex diseases are emerging. When I was a little girl, some of my family members died for different cancer at a young age. They all were practitioners with professional medical backgrounds but still suffered from human diseases.

I am wondering does cancer relate to people’s genes? Is cancer the largest community of gene-originated diseases? What kinds of human diseases have larger groups in the common genetic origin?

Thus, my childhood memory and curiosity allow me to try to design visualization for a relationship between human disorders and disease genes.

Inspiration

This visualization shows the human genome by focusing on some genes that are implicated in various diseases. Human has hundred of thousand genes but about a half of them will cause disease if genes misregulation and mutations.

Material

  • Gephi wiki : A free datasets. I found my dataset, “Diseasome”, from the biological networks category.
  • Gephi Software : A free software that I used to create network visualization.

Methods

At first, I spent a lot of time to find the dataset that I need. After finding the ideal dataset from Gephi wiki, I downloaded it and renamed the column names for importing to Gephi.

After importing to Gephi, the screen showed the random node positions with different colors. I aim to create clear and beautiful visualizations in my project so I tried several layouts and finally decided to use “Force Atlas 2”, “Fruchterman-Reingold”, and “YifanHu Multilevel.”

By using the “Statistic” feature, I calculated the average path length of the network and set up the node’s colors and sizes by degree. This step helps improve the legibility of my visualization. In the end, I also back to the “Layout” function and use “Adjust label” to prevent the texts of labels from overlapping with each other.

Result

In order to answer my first question “Does cancer relate to people’s genes?”, I used Gephi to create my first visualization with only 2 colors to see the relationship (Figure. 1.1 & 1.2).

Figure.1.1
Figure.1.2

The pink color of the nodes represents the type of gene, and the green color of the nodes shows the type of disease. Through these images, people can know the area of the “Colon cancer” has the highest density than others, and there are also different cancers around. In the highest density area, it contains a lot more pink color than other areas as well. Therefore, combining with the inspiration of this study, we might assume that those human diseases in this area are highly related to misregulation and mutations of genes and genetics.

When people see the image above (Figure. 2), they can know “Colon cancer”, “Leukemia”, and “Deafness” having a larger group than others by measuring the text label sizes. The orange color that represents a cancer group also has the largest community of the human disease.

Reflection

In the beginning, my dataset was too large to run on my computer. Especially, when I applied the “Force Atlas 2” layout. It took a long time to run the data and ended up shutting down. This is my first time to realize how big my dataset is. Besides, I found that the final visualization will change to different appearances if I applied another layout to the original one. I am so curious about the algorithm behind each layout. In the future, I would like to do some research on that and create a better look on Gephi.

Other Related Posts of Data Visualization

🔗 The Future Assumption of The World Population on Tableau

🔗 Explore Green in New York on Carto

🔗 Big Events in the 20th Century Infographics on TimelineJs

🔗 NYC Bicyclists Life of Data visualization on Tableau

--

--