Data-driven Resources for Down Syndrome Families

First steps towards re-imagining the DS-Connect dataset.

RE Wood
VisUMD
9 min readDec 12, 2022

--

Illustrations I made of People with Down Syndrome! Please contact me if you want to use them!

Chromosomes directly impact how a person develops. Chromosomes tell DNA how to copy cells accurately. Down Syndrome (DS) is caused by a triplication or partial copy of chromosome 21, which affects how cells multiply during development (CDC, 2022).

Figure 1: 23 pairs of chromosomes are passed down to the child from their parents.
Figure 2: Trisomy 21 — triplication (left); Translocation — partial copy of T21 on another chromosome (middle); Mosaic — A mixture of triplication and partial copying (right).

Because of this, people with DS often have complicated medical histories with multiple health and psychiatric conditions, which impact them at a higher rate than most typically developing (TD) people (Bull, 2020). People with IDDs, like Down Syndrome, often have greater health disparities than their TD peers (Alshammari et al. 2018). One potential factor contributing to health disparities might be the higher incidence of these multiple chronic conditions (MCCs).

How I am Exploring the Dataset

Using the NIH’s publicly available DS-Connect dataset, which includes some demographic information and the various diagnoses of over 3,300 individuals with DS (DCC, 2022; NIH, 2022b), I sought to answer two questions:

  1. How representative is the DS-Connect participant database compared to the US population?
  2. Using network analysis for a multi-morbid approach to the data, what conditions commonly occur and are the most influential?

Intended Audience

The INCLUDE datasets were made for medical researchers. However, I imagined the DS-Connect Participant Database as a resource for families with loved ones with DS. I wanted to visualize information about the DS population in an easy-to-understand way.

Figure 3: Various body systems are viewed as cross-sections on glass slides using the metaphor of a book you can page through. Filters on the side control include basic stats about conditions and systems and filters of people with DS who represent two sexes of various race and ethnicity combinations.

Initial Designs

I drew cross-sections of body systems, like glass slides, that could layer like a full-body scan. However, a Sankey version of this design was abandoned (fig. 4). I was concerned the relationships between systems and conditions would be lost.

Figure 4: The Slide sketch represented as a Sankey Diagram.

Figure 5 shows the filters, the count, and percent by sex, race, ethnicity, and DS-specific filters: All DS Types, Generic/Unknown DS Type, Trisomy 21, Translocation DS, or Mosaic DS.

Figure 5: Person-centered filters on top of a social network approach to visualizing body Systems and related conditions.

I wanted my visualization to be oriented around the people the data was supposed to represent. Figure 6 was a mockup of the social network approach to the dataset with illustrations of people with DS as filters. Body systems have simple illustrations labeled as the medical term and a plain text explanation of the system.

Figure 6: Initial Mockup in Illustrator of social network analysis using color to organize conditions by body system.

Co/Multi-Morbid Approaches to Health Data

Past MCC visualizations often use comorbid approaches. Comorbidity is when there is a condition in addition to the primary condition (AKA DS) (Feinstein, 1970). Unfortunately, using a comorbid-focused approach can skew data for several reasons, such as conditions that tend to cluster with other conditions (Harrison et al., 2021).

Because of this limitation, I used a multi-morbid approach when analyzing the DS-Connect dataset. A multimorbidity is when multiple chronic conditions coexist (Boyd & Fortin, 2010). Multimorbid approaches are also more person-centered because they can identify patterns and clusters across all coexisting, co-occurring conditions simultaneously (Harrison et al., 2021).

Data Clean Up

One record was removed: a “test” account with all 108 available conditions.

Beta Network Visualization

The beta release of the network made me realize I had processed the data in a co-morbid way. Upon this realization, re-processed to be multi-morbid. This would help me identify clusters of conditions more easily.

Figure 7: Beta Version of network analysis of DS-Connect Diagnoses. Color is used to organize conditions by body system. Node size is determined by the total number of people in the data that have the condition

Results

Representativeness Estimations

Researchers estimate that 1 in every 758 babies born in the US has DS (De Graaf, Buckley & Skotko, 2019; Mai et al., 2019). Most of the DS-Connect dataset consisted of people who reported having Complete Trisomy 21. However, this is unsurprising as 95% of diagnoses are Complete T21, 3% have Translocation and 2% have Mosaic (CDC, 2022). The latter two DS types follow a similar proportion. However, the unknown DS Type makes determining the exact makeup of the data set unclear.

Table 1: Count and percent of people with the various kinds of Down Syndrome in DS-Connect.

Health disparities increase further across racial and ethnic groups. To determine if the dataset was close to being representative of the wider US population, I used the 2020 US Census Data (n.d.).

Table 2: Count and percent of men and women by their self-identified race & ethnicities.
Figure 8: National percentages in black font, DS-Connect dataset in red font. The Native & Indigenous category includes Native Americans, Alaskan Natives, Native Hawaiians, and other Pacific Islanders.

Most racial and ethnic groups were either under or over-represented in the dataset compared to the wider US population. Unfortunately, the individuals in the dataset do skew white. The exception: people who were more than 1 race. They were the closest to being proportionally representative of the wider US population. There were also more female patients (53.35%) in the databases than males (46.64%).

Network Analysis to Explore Relationships between Diagnoses

To explore the data, I used multiple network analysis approaches. I created both a correlation matrix and an adjacency matrix in Tableau. The matrices were 108 x 108 conditions.

Video 1: Correlation & Adjacency Matrices

Correlation Matrix

The entire correlation matrix (0:00:13) indicates some grouping of conditions. However, only a small percentage of conditions showed a moderate correlation.

Adjacency Matrix

The adjacency matrix (0:00:20) signaled some potential clusters. There is some banding in conditions that frequently occur together. These also appear to connect with other conditions consistently. I interpreted this banding as being potentially influential conditions.

Video 2: Circle Networks

Circle Network Emphasizing Readability

This video shows the next iteration of the circle network. In the first half, the network is still color encoded to the 14 systems within the body (e.g. circulatory, sensory, lymphatic). The size is set to the number of individuals diagnosed with that condition. The edges are directional and weighted according to the strength of their connections with each other.

Circle Network Emphasizing Insight

The second half (0:00:22) also kept the keeps the systems colors and size encoding. However, the order of nodes is determined by the node’s specific Eigenvector Centrality. Eigenvector Centrality measures node prestige and influence within a network. Starting at the 12 o’clock position, the order goes from most (sleep apnea) to least important nodes. Interestingly, DS types ranked the lowest.

Video 3: Cluster Exploration

Networks Aimed at Cluster Identification

I explored the clusters of diagnoses in the network using the force-based Yifan Hu Proportional algorithm. The beginning shows how tightly clustered everything is. So, I proportionally expanded the overlapped nodes (0:00:03) to be more readable. System color remains. However, I used Eigenvector Centrality for the node size. The most prestigious nodes group in the center. Unlike the radial layout, the Complete Trisomy 21 node is more centrally located.

Then, I wanted to examine the clusters of conditions (0:00:23). Size is again set to count and color as body system. I used the Louvain community detection algorithm. The 3 arms are the clusters. They are ordered by node prestige using Eigenvector Centrality.

The last cluster exploration used the circle pack layout (0:00:56). The first hierarchy uses the Louvain Community Detection algorithm. The second hierarchy was Blondel’s Modularity (1.0 resolution, .05 score). Although the system color encoding remained, the node sizes used Eigenvector Centrality to determine prestige.

Most Frequently Occurring Conditions (Other than DS)

  1. Speech Disorder (63.2%)
  2. Intellectual Disability (54.8%)
  3. Hearing Loss (38.3%)
  4. Hypothyroidism (32.5%)
  5. Sleep Apnea (30.2%)
  6. Specific Language Impairment (28.8%)
  7. Atrial Septal Defect (21.3%)
  8. Myopia (19.5%)
  9. Ventricular septal defect (19.2&)
  10. Hyperopia (19.0%)

Most Influential Conditions (Other than DS)

  1. Sleep Apnea
  2. Speech Disorder
  3. Hearing Loss
  4. Intellectual Disability
  5. Hypothyroidism
  6. Specific Language Impairment
  7. Myopia
  8. Strabismus
  9. Hyperopia
  10. Atrial Septal Defect

Network Analysis Takeaways

When examining the top 10 most frequently occurring and the most influential conditions other than DS, most frequent did not necessarily mean they were also the most influential upon other conditions.

Participant Reflections

Three people informally evaluated the visualizations and provided feedback. Evaluators included a peer in our graduate-level data visualization class (P3), a Computer Science professor with nearly two decades of experience conducting technology research with the DS community (P1), and a target user who is a parent of a child, who is the DS-Connect dataset (P2).

The researcher (P1) wanted to be able to filter by age and additional information as they moused over the edges to learn more about the connections between conditions. The graduate student peer (p3) reported wanting to toggle between the highly readable layouts to the ones that showed more detailed relationships between conditions. They also reflected that “I might browse if I had a specific question about a condition I had, like COVID. I need a question before I jump in rather than just interact or explore it.” P2 — the only participant was also the intended audience for the visualization as they had a loved one with DS — echoed P3’s sentiment. P3 three also wanted to select the conditions that were only related to the diagnoses their loved one had to know “what we should be on the look out for.” They imagined the possibilities of using DS-Connect as an information resource and even a potential recruitment tool to help hesitant families–“especially in marginalized groups” — see an immediate value in medical research participation.

Conclusion

The diversity issue of the dataset is a concern, which may also have larger implications for the INCLUDE project. These very preliminary and rough estimates signal a need for direct engagement with historically underrepresented groups by actively partnering with grassroots DS organizations efforts that serve those specific communities. It may also suggest that engaging such groups to find out how medical research–and the DS-Connect database–can directly benefit them now by providing useful tools and the means of connecting with others like them, as suggested by one parent of a child with DS, who is part of the dataset.

Additionally, toggling between readable (i.e., nodes encoded by system & incidence) and multi-morbid network layouts (i.e., nodes using a condition’s influence & cluster) may balance readability with insight acquisition wanted by participants. Furthermore, adding more robust filtering and new features to the existing dataset in an easily digestible platform could encourage hesitant families to engage more directly in medical research and contribute to the NIH’s datasets. The parent participant concluded that the dataset platform currently “doesn’t have an interface that is designed for regular people to use. More people would sign up [if] they can see an benefit … especially in marginalized groups.” In conclusion, re-imagining how a participant database could be visualized and leveraged by the people it is intended to serve could, at the same time, address the issues with representativeness identified in this project. Future work will investigate how to make such a tool more useful by addressing the identified limitations and including the suggested features directly engaging with more of the intended audience as this project progresses.

References

  1. CDC. 2022. Facts about Down Syndrome. https://www.cdc.gov/ncbddd/birthdefects/downsyndrome.html
  2. Marilyn J Bull. 2020. Down syndrome. New England Journal of Medicine 382, 24(2020), 2344–2352.
  3. Alshammari, M., Doody, O., & Richardson, I. (2018). Barriers to the Access and use of Health Information by Individuals with Intellectual and Developmental Disability IDD: A Review of the Literature. 2018 IEEE International Conference on Healthcare Informatics (ICHI). 294–298.
  4. NIH: INCLUDE Project. (2022). INCLUDE Data Coordinating Center: INCLUDE Data Hub. https://portal.includedcc.org/login?redirect_path=/dashboard
  5. NIH: INCLUDE About. (2022). INCLUDE Data Coordinating Center: Our Mission. https://includedcc.org/about
  6. Feinstein, A. R. (1970). The pre-therapeutic classification of co-morbidity in chronic disease. Journal of chronic diseases, 23(7), 455–468.
  7. Harrison, C., Fortin, M., van den Akker, M., Mair, F., Calderon-Larranaga, A., Boland, F., & Smith, S. (2021). Comorbidity versus multimorbidity: Why it matters. Journal of Multimorbidity and Comorbidity, 11, 2633556521993993.
  8. Boyd, C. M., & Fortin, M. (2010). Future of multimorbidity research: how should understanding of multimorbidity inform health system design?. Public health reviews, 32(2), 451–474.
  9. De Graaf, G., Buckley, F., & Skotko, B. (2019). People living with down syndrome in the USA: Births and population. Down Syndrome Education International https://dsuri. net/us-population-factsheet.
  10. Mai, C.T., Isenburg, J.L., Canfield, M.A., Meyer, R.E., Correa A., Alverson, C.J., Lupo, P.J., Riehle-Colarusso, T., Cho S.J., Aggarwal, D.. (2019). National population-based estimates for major birth defects, 2010–2014. Birth defects research 111, 18 (2019), 1420–1435.
  11. U.S. Census Bureau. (n.d.). U.S. Census Bureau quick facts: United States. U.S. Census Bureau. https://www.census.gov/quickfacts/fact/table/US/POP010220

--

--