</ CANCER>

Visualization is the latest weapon in the war against cancer.

Anuja Bendre
VisUMD
5 min readDec 6, 2022

--

By Anuja Bendre and Devanshi Shah

Image from Unsplash.

Cancer is the second most common cause of death in the U.S. The National Cancer Institute predicted that in 2022, roughly 1.9 million people will be diagnosed with cancer in the United States.

In this article, we will explain the process of using data visualization to generate new insights for cancer research. We used Tableau to represent cancer statistics to help researchers and the general public to derive useful insights and inspire new discoveries which will improve the quality of life for cancer patients and possibly improve the survival rate of people diagnosed with cancer.

We created an information system to facilitate improvement in the treatment provided to cancer patients and explore possible areas for increasing the survival rate of cancer patients. Through visualization of different types of cancer and how it affects the population, we wanted to understand how demographic information such as ethnicity, age, sex, and lifestyles can impact cancer patients and gain insights from these observations.

We created a series of visualizations based on data gathered from publicly available datasets from the CDC, US Government and the International Agency for Research on Cancer (WHO) and other miscellaneous datasets available on Kaggle.

We started off by looking at cancer incidences across the globe. We created a choropleth map that displays country rankings based on the number of cancer incidences that occurred in that country. Australia is the highest ranking country in the world. Our analysis is reinforced by a recent article from Faye D’Souza which states that 2 in 3 people in Australia will develop common skin cancers as cancer incidence rises.

Fig 1: Country Rankings based on Cancer Incidences
Fig 1: Country rankings by incidences.

We decided to narrow our focus on cancer incidences, mortality and survival for patients in the United States.

Fig 2: Top 10 cancer deaths in the U.S.
Fig 3: Cancer survival rate based on cancer type and gender.

According to the National Cancer Institute, an estimated 287,850 women and 2,710 men will be diagnosed with breast cancer at the end of 2022, which makes it the most common cancer diagnosis. What amazed us was that the most common type of cancer among female cancer survivors is breast cancer.

Fig 5: Breast cancer statistics based on race.
Fig 4: Linear regression model for breast cancer diagnosis based on mammography records.

We further drilled down our research on breast cancer statistics to look at the survival rate of breast cancer patients based on surgeries performed for cancer treatment. We targeted high-risk surgeries such as Lumpectomy, Modified Radical Mastectomy, and Simple Mastectomy.

Fig 6: Survival rate based on surgery type.
Fig 7: Survival rate based on the stage of diagnosis.

We visualized data from Avalere to observe the relationship between early diagnosis and survival rate among cancer patients. We observed that Earlier Cancer Detection Improves Quality of Life and Patient Outcomes. Patients diagnosed with earlier stages of cancer (stage I-II) generally have a higher likelihood of recovery than those diagnosed at a later stage (stage III-IV).

Lastly, we developed a linear regression model to predict the number of cancer incidences and cancer deaths based on the year of occurrence.

Fig 8: Prediction model for cancer incidence.

This visual represents the linear regression models that we built to predict the number of cancer incidences and cancer-related deaths based on historical data captured from the CDC for the past 20 years. The user can select a year in the future, and this model will help predict the number of cancer incidences or deaths in the United States for that year.

Key Takeaways

We were inspired by the idea of a Cancer Ecosystem as proposed by The Vice President’s National Cancer Moonshot Initiative. We managed to create a data visualization tool that serves as an information system for healthcare practitioners as well as the general public to understand the statistics in and around cancer to destigmatize myths related to cancer and spark new discussions on how we can bring change in the current trends to ensure a better tomorrow.

The World Economic Forum’s article on missing links in cancer research talked about how the absence of complete data affected research negatively, wherein two cancer research institutes with the same technology led to different results because of discrepancies with data obtained from the same source. This motivated us to make data easily available in a format that can be digested easily even by the common audiences, to provide uniformity between multiple data sources. This led us to create a visualization platform that would help ‘connect the dots’, because each cancer tells a different story.

Future Scope

Beyond the scope of this project, we would like to understand and deduce causal relationships with prediction, early detection and survival rate of cancer with regards to genetic and hereditary information for cancer patients. The types of cancer we would like to focus on are breast cancer, ovarian cancer, skin cancer, and prostate cancer. Due to time limitations, we were not able to visualize how lifestyle changes can affect cancer incidences and mortality, which would be a good addition to this project.

Since you stayed with us till the end, here are a few bonus readings which might interest you:

--

--