Analysis of Menstrual Cycle Symptoms

How do various menstrual symptoms relate to menstrual disorders?

Colleen Wang
INST414: Data Science Techniques
5 min readApr 29, 2024

--

Stakeholders/Decisions informed:

Despite affecting over half of the world’s population, women’s menstrual cycles are an under-valued and under-researched area of study. Understanding the symptoms and patterns of menstrual cycles can lead to advancements in women’s health and improve the diagnosis of menstrual disorders. By analyzing and understanding the context of menstrual cycle data, the question of how various menstrual symptoms relate to menstrual disorders can be answered. The specific stakeholders asking this question are medical professionals and researchers in the field of women’s health. The exploration of menstrual symptom data will help stakeholders draw conclusions when recognizing menstrual disorders and inform decisions in diagnosis and treatment.

Investing research into the field of menstrual symptoms and disorders could also inform significant decisions when addressing the ambiguities of women’s health. Analyzations of menstrual cycle data can lead to crucial insights for medical professionals and researchers and their decisions within education, policy, diagnosis, and treatment. The decisions the answer to how how different menstrual symptoms relate to menstrual disorders could inform promote a shift in the importance of inclusive women’s healthcare. The answers to this question could enhance the quality of life for individuals with menstrual disorders across the world.

Data:

The data that could answer this question should include data about menstrual cycles and symptoms experienced, including their duration and strength. The fields contained in the data that could answer this question should include the length of the cycle, the strength of bleeding on peak days, physical pain or discomfort, and height and weight data. These fields would be relevant to answering this question because analyzing the patterns and frequencies of these symptoms can lead to better diagnosis of menstrual disorders. Analysis of these fields could answer the question of whether there is a relationship between menstrual symptoms, biometric data, and menstrual disorders. I collected a subset of this data on Kaggle, a free resource for open data sets. This data set contains some fields relevant to this question and some general fields that may be useful to draw insight, but are not necessary for this analysis, such as income. The fields contained in this data set are:

  • Number_of_peak
  • Age
  • Length_of_cycle
  • Estimated_day_of_ovulation
  • Length_of_luteal_phase
  • Length_of_menses
  • Unusual_bleeding
  • Height
  • Weight
  • Income
  • BMI
  • Mean_length_of_cycle
  • Menses_score

Data Analysis/Figures:

For this data analysis, I used Tableau to create data visualizations and to draw conclusions. These visualizations can help medical professionals and researchers answer the question of how various menstrual symptoms relate to menstrual disorders. Each figure compares menstrual cycle data with biometric information such as age and weight to inform decisions surrounding this question.

Figure 1: Avg Length of Cycle by Age
Figure 1: Average Length of Cycle by Age

From figure 1 we can see the average length of menstrual cycles by age and see that the highest length of cycles are from women ages 18–21. I used the variables length_of_cycle and age to first show an overview of the performance of menstrual cycles as age increases. Conclusions from this comparison can help medical professionals and researchers consider the impact of menstrual symptoms when diagnosing a menstrual disorder.

Figure 2: Average Number of Peak by Weight

Figure 2 explores the comparison of the average number of peak bleeding days and weight. This analysis shows ambiguity in the effect of weight on number of peak bleedings days, however, inferences can still be drawn from which age groups experiences the most peak days during their menstrual cycle. This analysis can help draw conclusion about how weight plays a role in peak bleeding days and other menstrual symptoms and inform medical professionals and researchers the significance of this relationship. The prediction of peak days during a menstrual cycle can aid in the recognition of menstrual disorders and the decisions surrounding treatment.

Figure 3: Average Estimated day of Ovulation by Length of Cycle and Unusual Bleeding

Figure 3 explores how the symptom of unusual bleeding relates to a woman’s length of cycle and estimated day of ovulation. For this analysis I used the variables unusual_bleeding, length_of_cycle, and estimated_day_of_ovulation. By cross comparing the information from this figure with the information in Figure 2, conclusions can be drawn from the number of peak bleeding days and whether it is considered unusual bleeding. The relationship between unusual bleeding and estimated day of ovulation can help medical professionals and researchers inform decisions surrounding unusual menstrual cycle symptoms and possible menstrual disorder diagnosis.

Data cleaning/Bugs:

To clean this data set, I used Tableau to extract the variables that I wanted to compare and to delete some uneccessary variables. Some bugs that I think others may encounter is the meaning of each columns’s variable name. Some variables are variations of eachother such as length_of_cycle and mean_length_of_cycle. When completing this analysis, others could confuse the value of the variable with the value that the analysis is showing. To fix this, I used the length_of_cycle variable as the standard value, and used the average feature within Tableau to calculate the most accurate values for the averages. The data set also has some ambiguity for the height variable when defining feet and inches. Some entries do not match the formatting for the rest of the data set so I did not include these values in the analysis. The data set also uses the metric system and units of measurement like INR for income which could cause confusion when analyzing this data set. I labeled the units of measurement in the visualization to ensure there is no confusion.

Limitations/Bias:

The symptoms and behaviors of menstrual cycles are widely under-researched so there are limited readily-available data sets to explore. The analyzation of the data set would also be more beneficial if there was more data on other specific menstrual symptoms rather than the overall menstrual cycle. Some limitations to this data set include the variables measured such as length_of_luteal_phase and income as these values might not provide significant insights to answer the proposed question. The research of womens health and menstrual cycles also presents many opportunities for bias to be present. There are possibilities for bias in the collection of the data as well as any analyzations it is referenced in. Overall, women’s health is undervalued in the medical environment which presents some limitations and many possibilities of bias.

Here is a link to a GitHub repository that contains the code I have developed for this assignment: https://github.com/cwangg/INST414-Modules/tree/main/module-1

--

--