Tumor Size and Chances of Malignancy

Anapeshku
INST414: Data Science Techniques
5 min readFeb 10, 2024

Breast cancer is one of the most common forms of cancer across the world. According to the Center for Disease Control, roughly 240,000 cases of breast cancer are diagnosed annually in women and 2,100 cases occur annually for men. The American Cancer society predicts that this number will rise to 310,720 cases in 2024, with an estimated amount of 42,250 of cases resulting in death. The most efficient way to lower mortality rate for this condition is to monitor and treat the effected area during the earliest stages. In order to do so, patients need to notice any changes or discomfort in their chest area and report this information to their primary caregiver. From this point, if it is believed that the abnormality in the chest could be cancer, then a mammogram and further testing can be done to reach a diagnosis. Sometimes, when a lump is found in that area, it is just a benign growth that will not spread and has a low likelihood of causing serious issues. If a growth is found to be malignant, then more intensive treatment is typically needed. The question that will be attempted to be answered through this dataset is if there is a correlation between the size of a lump and the chances of it being malignant.

Since the first step in determining if a lump is cancerous is typically done through self-observation, some women may disregard if there is only a small lump, only prioritizing larger abnormalities in the breast tissue. However, this mentality could lead to a malignant tumor progressing further in stages which results in a higher chance of death. Additionally, breast cancer can affect people regardless of gender and has multiple stages that are typically categorized by size of tumor, among other factors. Finding if there is a relationship between size and malignancy can help inform people about when to get a lump/abnormality in the skin checked out and when to be concerned about finding these growths in the skin. By having this knowledge, people can make more informed health decisions, especially when doing self-assessments before visiting an oncologist. The answer to the proposed question can be found through this dataset: https://www.kaggle.com/datasets/yasserh/breast-cancer-dataset, which was found from Kaggle. The information includes columns stating if the case was malignant or benign, the radius of the lobes, the mean surface texture (represented numerically), the perimeter, area, smoothness (represented numerically), compactness, concavity, and symmetry of the tumor. For the purposes of this question, size will be measured using radius. This is in order to keep consistent with pathological testing due to tumors typically being measured using diameter in that field. The dataset includes 570 cases.

It was found that the average radius of malignant tumors was ~15.55 while benign tumors had an average of 13.28. Malignant tumors also had a larger standard deviation of 4.131 while this value resulted in 2.79. The largest value of a malignant tumor found in this dataset had a radius of 28.11 while the largest benign tumor had a value of 21.16. The smallest sizes found for the different types of tumors were 7.691 for malignant and 6.981 for benign. These values were found through creating two different lists of dictionaries, each represent a different breast cancer case. The lists were split by benign or malignant. After these lists were created, only the radius value was taken into account, and then the values above were found by performing various equations on the data. More information on the code can be found at this github link: https://github.com/anapetsmart/INST414/blob/main/module1.py An issue that may occur when performing this analysis is incorrectly creating the different lists. If the data is loaded in incorrectly or someone accidentally sorts by the wrong dictionary value, this could result in any subsequent analysis being inaccurate since it would be including cases that were not relevant to the analysis being done. Additionally, the data is sorted in order to easily find the highest and lowest size of each category. If someone were unfamiliar with sorting by dictionary values in this manner, then this could cause issues when comparing largest and smallest sized tumors. The easiest way to solve these issues is to practice manipulating lists of dictionaries before dealing with larger datasets, such as this one. A table of all the values mentioned is posted below:

These values show that malignant tumors, on average, tend to be larger than benign tumors. As can be seen through the values, in all of the categories measured, malignant tumors were found to be larger. The most surprising difference was that the largest malignant tumor had a difference of 7.04 when comparing it to the largest benign case. The outcome of this analysis means that people should be more concerned if they have a larger tumor because there is a higher likelihood of it being malignant. This conclusion does not mean that smaller tumors should be ignored, instead it implies the opposite; that smaller tumors should be treated before growing larger and becoming malignant.

While this analyzing the dataset does result in the conclusion that malignant tumors tend to be larger, this does not mean that smaller abnormalities in breast tissue should be ignored. As can be seen by the data, there are malignant tumors on the smaller end of the spectrum. Also, even though benign growths tend to need less extensive treatment then malignant ones, someone should still seek treatment if the result is benign. There were some limitations to this analysis. Radius was taken into account to stay consistent with how tumors are measured in pathology labs, but other factors were not taken into account since this question was focusing on size. When doing both self-assessments and clinical testing for tumors, other factors that contribute to overall health should be taken into account. The dataset that was being analyzed did not include information such as vital signs, if a patient participated in activities that would increase their risk for breast cancer, if the patient in the case study had breast cancer previously or if the patient has any other medical conditions. Stage four cancer is also categorized by the ability to metastasize and effect other organs in the body which can be hard to determine from just size. Even with these limitations, it is still important to get a breast abnormality before it gets larger due to larger size correlating to higher chance of malignancy.

--

--