Analyze Death Age Difference of Right Handers with Left Handers

Abhishek Shettigar
11 min readOct 29, 2023

--

Death age difference of right-handers and left-handers is an intriguing topic that has been studied by researchers for many years. Some studies have found that left-handers tend to die younger than right-handers, while others have found no significant difference.

There are a number of possible explanations for this observed difference in death age. One possibility is that left-handers are more likely to engage in risky behaviors, such as smoking and drinking, which can lead to premature death. Another possibility is that left-handers are more likely to be exposed to environmental toxins, such as lead and pesticides, which can also shorten lifespan.

Finally, it is also possible that there is a biological difference between left-handers and right-handers that makes left-handers more susceptible to certain diseases, such as heart disease and Alzheimer’s disease.

More research is needed to determine the exact cause of the observed difference in death age between right-handers and left-handers. However, the findings of existing studies suggest that it is a complex issue with multiple contributing factors.

Significance of the topic

The study of death age difference between right-handers and left-handers is significant for a number of reasons. First, it can help us to better understand the factors that contribute to premature death. This knowledge can then be used to develop interventions to prevent premature death in both right-handers and left-handers.

Second, the study of death age difference between right-handers and left-handers can also help us to learn more about the biology of handedness. By understanding the biological differences between left-handers and right-handers, we may be able to develop new treatments for diseases that are more common in one group than the other.

Finally, the study of death age difference between right-handers and left-handers can also help to raise awareness of the unique challenges faced by left-handers in society. By understanding the health risks associated with left-handedness, we can work to create a more inclusive society that supports all people, regardless of their handedness.

The project is solved using the six-phases of data analysis: — Ask, Prepare, Process, Analyze, Share and Act.

Solution

Ask

The ask phase is the start of the data analysis cycle, it involves clearly defining the scope of the project, the problem to be solved, and identifying stakeholders and stakeholder’s expectations by asking SMART (Specific, Measurable, Action-oriented, Relevant, Time-bound) questions.

The following questions may guide us to our analysis: -

  1. Load the handedness data from the National Geographic survey and create a scatter plot of “Left-handed Rate” vs. “Age”.
  2. Add two new columns, one for birth year and one for mean left-handedness, then plot the mean as a function of birth year.
  3. Create a function that will return P(LH | A) for particular ages of death in a given study year.
  4. Load death distribution data for the United States and plot it.
  5. Create a function called P_lh() which calculates the overall probability of left-handedness in the population for a given study year.
  6. Write a function to calculate P_A_given_lh().
  7. Write a function to calculate P_A_given_rh().
  8. Plot the probability of being a certain age at death given that you’re left- or right-handed for a range of ages.
  9. Find the mean age at death for left-handers and right-handers.
  10. Redo the calculation from Task 8, setting the study_year parameter to 2018.

Prepare

This includes identifying the source of the information that will be utilized for the analysis, guaranteeing that the information source is dependable, unique, thorough, current and referred to, demonstrating the knowledge, ensuring that the information is liberated from any bias in the assortment of the information and, regarding each part of data ethics while dealing with the information.

This notebook uses two datasets: death distribution data for the United States from the year 1999 (source website here) and rates of left-handedness digitized from a figure in this 1992 paper by Gilbert and Wysocki.

The data’s credibility and integrity can be assessed in the framework of the ROCCC.

  • Reliability: The data is reliable due to its sample size (approx. 7 million people).
  • Originality: The datasets are original. They were collected by CDC.
  • Comprehensiveness: The data is comprehensive due to data having features like nationality, handedness, age, sex and race.
  • Current: The data was collected during the 1986. Hence it is outdated.
  • Cited: It was cited by CDC and 1992 paper by Gilbert and Wysocki.

Process

This involves all the steps taken to clean the data, making sure the data has integrity (the data is accurate, complete, consistent and trustworthy) before analyzing it, aligning the data to the business objective and also carrying out data verification. We must sure that the process involves checking of misspellings, inconsistent capitalizations and typos, checking for duplicate entries and blank cells and checking for consistent data format across each column.

Since the data is in the form of pdf formats, which already consists of clean quality data, there is not much data cleaning process involved. The technical skill used in this project is Python programming language. The IDE used here is Jupyter Notebook. Python libraries such as pandas, NumPy and matplotlib.pyplot will be used to analyze the dataset to solve the given tasks.

Analyze and Share

  1. Load the handedness data from the National Geographic survey and create a scatter plot of the “Left-handed Rate” vs. “Age”.

In this notebook, we will explore this phenomenon using age distribution data to see if we can reproduce a difference in average age at death purely from the changing rates of left-handedness over time, refuting the claim of early death for left-handers. This notebook uses pandas and Bayesian statistics to analyze the probability of being a certain age at death given that you are reported as left-handed or right-handed.

A National Geographic survey in 1986 resulted in over a million responses that included age, sex, and hand preference for throwing and writing. Researchers Avery Gilbert and Charles Wysocki analyzed this data and noticed that rates of left-handedness were around 13% for people younger than 40 but decreased with age to about 5% by the age of 80. They concluded based on analysis of a subgroup of people who throw left-handed but write right-handed that this age-dependence was primarily due to changing social acceptability of left-handedness. This means that the rates aren’t a factor of age specifically but rather of the year you were born, and if the same study was done today, we should expect a shifted version of the same distribution as a function of age. Ultimately, we’ll see what effect this changing rate has on the apparent mean age of death of left-handed people, but let’s start by plotting the rates of left-handedness as a function of age.

Image Source: — Click Here

2. Add two new columns, one for birth year and one for mean left-handedness, then plot the mean as a function of birth year.

Image Source: — Click Here

3. Create a function that will return P(LH | A) for particular ages of death in a given study year.

The probability of dying at a certain age given that you’re left-handed is not equal to the probability of being left-handed given that you died at a certain age. This inequality is why we need Bayes’ theorem, a statement about conditional probability which allows us to update our beliefs after seeing evidence.

We want to calculate the probability of dying at age A given that you’re left-handed. Let’s write this in shorthand as P(A | LH). We also want the same quantity for right-handers: P(A | RH).

Here’s Bayes’ theorem for the two events we care about: left-handedness (LH) and dying at age A.

P(LH | A) is the probability that you are left-handed given that you died at age A. P(A) is the overall probability of dying at age A, and P(LH) is the overall probability of being left-handed. We will now calculate each of these three quantities, beginning with P(LH | A).

To calculate P(LH | A) for ages that might fall outside the original data, we will need to extrapolate the data to earlier and later years. Since the rates flatten out in the early 1900s and late 1900s, we’ll use a few points at each end and take the mean to extrapolate the rates on each end. The number of points used for this is arbitrary, but we’ll pick 10 since the data looks flattish until about 1910.

4. Load death distribution data for the United States and plot it.

To estimate the probability of living to an age A, we can use data that gives the number of people who died in a given year and how old they were to create a distribution of ages of death. If we normalize the numbers to the total number of people who died, we can think of this data as a probability distribution that gives the probability of dying at age A. The data we’ll use for this is from the entire US for the year 1999 — the closest I could find for the time range we’re interested in.

In this block, we’ll load in the death distribution data and plot it. The first column is the age, and the other columns are the number of people who died at that age.

Image Source: — Click Here

5. Create a function called P_lh() which calculates the overall probability of left-handedness in the population for a given study year.

In the previous code block we loaded data to give us P(A), and now we need P(LH). P(LH) is the probability that a person who died in our particular study year is left-handed, assuming we know nothing else about them. This is the average left-handedness in the population of deceased people, and we can calculate it by summing up all of the left-handedness probabilities for each age, weighted with the number of deceased people at each age, then divided by the total number of deceased people to get a probability. In equation form, this is what we’re calculating, where N(A) is the number of people who died at age A (given by the dataframe death_distribution_data)

6. Write a function to calculate P_A_given_lh().

Now we have the means of calculating all three quantities we need: P(A), P(LH), and P(LH | A). We can combine all three using Bayes’ rule to get P(A | LH), the probability of being age A at death (in the study year) given that you’re left-handed. To make this answer meaningful, though, we also want to compare it to P(A | RH), the probability of being age A at death given that you’re right-handed.

7. Write a function to calculate P_A_given_rh().

8. Plot the probability of being a certain age at death given that you’re left- or right-handed for a range of ages.

Now that we have functions to calculate the probability of being age A at death given that you’re left-handed or right-handed, let’s plot these probabilities for a range of ages of death from 6 to 120.

Notice that the left-handed distribution has a bump below age 70: of the pool of deceased people, left-handed people are more likely to be younger.

Image Source: — Click Here

9. Find the mean age at death for left-handers and right-handers.

Finally, let’s compare our results with the original study that found that left-handed people were nine years younger at death on average. We can do this by calculating the mean of these probability distributions in the same way we calculated P(LH) earlier, weighting the probability distribution by age and summing over the result.

10. Redo the calculation from Task 8, setting the study_year parameter to 2018.

We got a pretty big age gap between left-handed and right-handed people purely as a result of the changing rates of left-handedness in the population, which is good news for left-handers: you probably won’t die young because of your sinisterness. The reported rates of left-handedness have increased from just 3% in the early 1900s to about 11% today, which means that older people are much more likely to be reported as right-handed than left-handed, and so looking at a sample of recently deceased people will have more old right-handers.

Our number is still less than the 9-year gap measured in the study. It’s possible that some of the approximations we made are the cause:

  1. We used death distribution data from almost ten years after the study (1999 instead of 1991), and we used death data from the entire United States instead of California alone (which was the original study).
  2. We extrapolated the left-handedness survey results to older and younger age groups, but it’s possible our extrapolation wasn’t close enough to the true rates for those ages.

One thing we could do next is figure out how much variability we would expect to encounter in the age difference purely because of random sampling: if you take a smaller sample of recently deceased people and assign handedness with the probabilities of the survey, what does that distribution look like? How often would we encounter an age gap of nine years using the same data and assumptions? We won’t do that here, but it’s possible with this data and the tools of random sampling.

To finish off, let’s calculate the age gap we’d expect if we did the study in 2018 instead of in 1990. The gap turns out to be much smaller since rates of left-handedness haven’t increased for people born after about 1960. Both the National Geographic study and the 1990 study happened at a unique time — the rates of left-handedness had been changing across the lifetimes of most people alive, and the difference in handedness between old and young was at its most striking.

--

--

Abhishek Shettigar

Making a career switch from Biomedical engineering to Data analytics. Recently completed Google Data Analytics Professional Certificate.