Defeating Fate

Taking the battle to inherited diseases using data visualization.

Sejal Singh
VisUMD
8 min readDec 15, 2022

--

Image from The New York Times.

“Fear never builds the future, but hope does.”

People often like to think inherited diseases are a matter of fate, which you cannot escape. It is science, after all! While that is correct, and you do not get to choose which genes are passed on to you, one must keep in mind that mankind has seldom accepted defeat at the hands of fate. One of the many wars that researchers, medical professionalsm and public health agencies all over the world have been bravely and constantly waging for many, many years is the one against inherited diseases.

Inherited diseases — also known as genetic diseases — are caused by a DNA abnormality, and thus run in families. If we were to go into the specifics, inherited diseases occur due to a change in the DNA sequence, which drives it away from the normal sequence. This harmful change to a gene is known as a mutation, and may be partially or completely responsible for altering the DNA sequence. This, in turn, raises an individual’s risk of having an inherited disease. While some inherited diseases present symptoms at the time of birth, others may develop later in life.

Depending upon the type of gene mutation as well as the involvement of other factors, inherited diseases can be divided into three broad categories:

  1. Chromosomal genetic disorders are caused due to missing or duplicated chromosome material in individuals, such as in people diagnosed with Down syndrome or FragileX syndrome.
  2. Complex genetic disorders occur due to a combination of mutations and other external factors, such as exposure to chemicals, tobacco or alcohol abuse, and inadequate diet. Late-onset Alzheimer’s, arthritis, Autism Spectrum Disorders, cancer and diabetes are some of the common examples of complex genetic disorders.
  3. Single-gene disorders are marked by mutation in a single gene, such as in cystic fibrosis and Tay-Sachs syndrome. Additionally, inherited diseases may also lead to rare diseases, which currently affect fewer than 25 to 30 million individuals in the US.

Circling back to the context of this article: Complex genetic disorders became the focus of my project due to my own lineage and experiences with this subcategory of inherited diseases. Cancer runs in my family: My paternal grandmother had it, her father had it, and my father has been battling it since 2019. As daunting as this direct line of passage may seem, it prompted me to learn more into how cancers are passed down through families. Learning more about the disease informed me that while some cancers can be inherited, others cannot. This prompted me to look further into complex genetic disorders, which led me to learning more about the multitude of diseases that fall into this family of diseases, such as congenital heart diseases, diabetes and hypertension.

As I learnt more about the interplay between genes and the many external factors — including but not limited to social, economic and environmental — which together decide whether an individual will or will not be diagnosed with a complex genetic disorder at a certain point in their lives, I began thinking of ways I could aid people from all walks of life in accessing this knowledge in an easy way. It all boiled down to one simple answer: Data visualization.

And thus began a semester-long project, with a goal to create a dashboard which makes public health data about prevalence of common complex inherited diseases — and their correlation with other external factors — easily accessible to the public.

The Five Tenets

Before jumping into the design process, I wrote down the design principles which would go on to guide my design process:

  1. Simple: Since the goal of this dashboard is to make preventative information about inherited diseases easily accessible to the general public with different levels of education and technical proficiency — especially populations who are disadvantaged and high-risk — it was imperative that data visualizations be designed in such a way that they are easy to interpret. This meant encouraging easy decision-making through appropriate usage of colors, easy filtering mechanisms, logical labels, and important details provided in an easily digestible format.
  2. Standard: In order to make the dashboards easier and quicker to understand, displaying information in a standard and concrete manner is important. This was achieved using common abbreviation, proper formatting, and identical scaling for charts. Additionally, it was ensured that clear and concise insight was conveyed through consistent layouts across all the data visualizations in the scope of this project.
  3. Emphasis: Important pieces of information throughout the dashboard were emphasized through appropriate usage of colors, contrast, size and/or negative space. This would help users quickly see and digest important and relevant information.
  4. Balance: The dashboard utilizes asymmetric balance to organize data visualizations in accordance with their importance. The importance of a particular data visualization — which guides its size and placement on the dashboard — has been computed using the perceived effect of the measure in question on the risk-status of an individual.
  5. Unity/Harmony: Proximity has been effectively used to establish tie different visualizations together as part of one whole picture, which is conveyed through the dashboard. In the same vein, colors that work well together and consistent fonts have been picked for the visualizations to establish harmony throughout the dashboard.

Two Is Better Than One (ft. Sketching)

I initially began with listing down the different types of data visualizations I could create using the data I had. My biggest mistake: I did not yet know the limitations of Tableau, and went at it with a very open mind.

However, upon learning how Tableau functioned, I zeroed it down to the visualizations that the datasets that I had and the functionalities of Tableau would together allow. Looking closely at the datasets also allowed me to boil my scope down to Maryland, as the state-level data was vast enough for a semester-long endeavor.

I initially started out by laying out visualizations in a dashboard layout, which would be of value to the representative users. Taking into consideration the limitations posed by the datasets I was using, I decided to create visualizations that could be globally filtered by a chosen inherited disease (ex: Arthritis) and a Maryland county (ex: Prince George’s), and conveyed information about one particular metric at county-level at a time, without any chart-specific filters (ex: A visualization designed to show county-level Diabetes cases by race).

Initial sketch.

While these data visualizations provided all the relevant information, I realized they could be better categorized to simplify the dashboard and make it easier for users to gain insights through fewer, logically categorized visualizations. I also felt introducing filters to each visualization would make them more beneficial to the end user. Lastly, after I cleaned and aggregated the data, I also realized that the Cases vs Environment visualization was not feasible, as I did not have any reliable data source for county-level number of cases over the years.

With these takeaways and limitations in mind, I drew a set of visualizations which served as an inspiration for my final designs.

Revised Sketch 1: Disease prevalence.
Revised Sketch 2: Demographic impact.
Revised Sketch 3: Environmental impact.

The Public Knows Better

After creating my data visualizations in Tableau and stitching them together into a dashboard, I recruited three representative users to test my dashboard. This would not only help me verify if the dashboard was truly helpful to the general public, but to also gather the much-needed feedback to improve upon my dashboard.

While all of the participants loved the concept and found the dashboard useful, they also had some great critiques to share through not just remarks, but also their behavior. I prioritized improvements on the basis of the magnitude of issues as well as time (or lack thereof), and implemented them promptly.

Final Reveal

After all the tweaks, the final dashboard was up and running in all its glory, with all its 4 views.

View 1: Disease prevalence.

Disease Prevalence: The user initially lands in this view, which includes 4 data visualizations which together communicate the county-wise number of cases of inherited diseases in 2015, as well as prevalence of disease in different counties. The choropleth chart on top additionally acts as a filter for 3 of the visualizations. Users can choose to see county-wise number of cases and disease prevalence for all diseases, or choose just one disease from the global filter provided on the right.

View 2: Demographic impact.

Demographic Impact: This view pairs demographic data, such as race, gender and age, with number of cases and disease prevalence with a goal to communicate how prevalent a certain disease is in a certain population. Filters on the right allow for users to choose an inherited disease to see relevant charts.

View 3: Environmental impact.

Environmental Impact: The 3rd view features a single visualization, which plots county-wise number of disease cases against median AQI — arranged in descending order — to allow users to understand how air quality affects the prevalence of inherited diseases.

View 4: Lifestyle impact.

Lifestyle Impact: Finally, the last view features one visualization, which plots habits against disease prevalence. The filters allow users to choose one out of the 5 habits, and the status filter is updated to show relevant sub-filters for the particular habit (ex: When user picks “Smoking” as habit, the status filter choices are updated to show 4 options, namely “All”, “Never smoked”, “Current smoker” and “Former smoker”). The user can further filter the chart using these status sub-filters to see disease prevalence against each status under the habit.

Where Do We Go Now?

While the dashboard has a very limited scope and is in a very nascent stage, it is interesting to note that no such platform exists in the present day. In the future, the scope of the dashboard could be widened to include not just other US states, but also other countries. Other external factors, such as income and education data could also be integrated to introduce new dimensions for comparison. More diseases could also be introduced in the future to make the platform more inclusive and useful.

Additionally, I’d also like to take this project out of Tableau and implement it on another platform that allows for more flexibility and better speed. Introducing general health suggestions based on an individual’s interactions with the data visualizations would be an interesting feature to integrate.

In the meantime, check out the current dashboard here.

--

--

Sejal Singh
VisUMD
Writer for

Improving user experiences by the day, narrating my own by the night.