“You Are Not The Father”- Maury Povich

Sainjeev Srikantha
Human Systems Data
Published in
4 min readApr 12, 2017

Have you ever found yourself watching daytime television? If so you have most likely stumbled upon the mess of tabloid talk shows that is Maury. The most popular segment of the show is when a couple comes on and wishes to do a paternal test to determine whether the male in the couple is the father of the child. If you have seen the show you already know how entertaining this can be, but if not do yourself a favor and watch some clips on YouTube.

As you can imagine a standard paternal test is administered, but have you ever wondered why a paternal test is 99.99% accurate and not 100%? This is due to Bayes Theorem. Bayes Theorem is a mathematical relationship between conditional and posterior probabilities dependent on parameters such as the given data and prior data (Kruschke, 2010). If this does not make any sense then you are in luck, because you are about to learn today.

Before diving into Bayesian data analysis lets go over traditional data analysis first. Data analysis, or statistical inference, is the process in which a conclusion is drawn based on data that is subject to random variation, such as observational errors and sampling variation (What-is-Statistical-Inference, 2017). This is done by null hypothesis significance testing, in which after data is collected a probability is determined by computing the p value of a summary statistic. If the probability is low, for example p<0.05, the null hypothesis is rejected. Null hypothesis significance testing summarizes this data based on a point estimate, the mean or standard deviation. This point estimate is the parameter value which makes the analyzed data most consistent with the overall data. Point estimates are used to determine the power of the data and the ability of a study to be replicated (Kruschke, 2010). Statistical power is the likelihood that an effect will be detected if there is an effect to be detected (What is Statistical Power, 2017). This seems pretty straightforward, but there are some issues that arise.

The problem with using traditional p testing is that it does not give you an accurate representation of the research being analyzed. P value can be effected by the amount of data being analyzed. So for example if there is not enough participants in a study the p value can be higher than 0.05, meaning that there is no significance when in reality the cause may be due to a lack of subjects. This can be fixed by using tests such as a Tukey test, but the consequences of this can lead researchers to look at only a few comparisons of the present data. The second issue is the point estimate. Point estimates do not provide information of any other parameters that are consistent with the data. In order to get around this confidence intervals are used. A confidence interval is a range of parameter values, which can be an issue because they only coincide with what the researcher intends on comparing. The statistical power of a study is determined by point estimate and is essentially based on the sample size of the study. This means that point estimates are essentially unclear in estimating confidence intervals and statistical power, which in turn makes research difficult to replicate with similar results (Kruschke, 2010).

Bayesian Analysis on the other hand incorporates prior knowledge in order to shape future analysis (Krushke, 2010). Knowledge about current model parameters is seen by using probability distribution on the prior distribution parameter. The new data is expressed as the likelihood and is proportional to the distribution of observed data in the model parameters. All of this information is combined with the prior to create a new probability distribution called posterior distribution. The posterior distribution is the basis of Bayesian inference (Bayes-Explained, 2017). The advantage of using this method is that it incorporates past information on a parameter and creates a prior distribution for analyzing future data. When new data is collected the posterior distribution can be used as the prior (Intro to Bayesian Analysis, 2017). This means that analyzed data will have a higher statistical power and be easier to replicate results due to the prior.

Lets go back and look at parental tests. In a parental test a combined percentage index (CPI), the ratio which measures shared alleles between individuals, is determined by using Baye’s theorem. The equation is CPI/(CPI +1). If the prior probability of someone being the father is 0.5 than that means denominator is larger than the numerator, and the probability of a paternity test will never be 100%. To take this further the probability of paternity(POP) equals the conditional probability P(B|A) , likelihood the man is the father, is multiplied by the division of the prior probability P(A) and the probability that that the data is from a randomly selected man P(B). The formula would look like this, POP = P(B|A) x P(A)/P(B) (Accuracy vs. Probability, 2017).

Who knew we would be using Bayesian data analytics while watching Maury.

Works Cited

Accuracy vs. Probability in Paternity Testing. (2016, August 10). Retrieved April 11, 2017, from https://dnatesting.com/accuracy-vs-probability-in-paternity-testing/

Kruschke, J. K. (2010). What to believe: Bayesian methods for data analysis. Trends in Cognitive Sciences, 14(7), 293–300. doi:10.1016/j.tics.2010.05.001

(n.d.). Retrieved April 11, 2017, from https://bayesian.org/Bayes-Explained

(n.d.). Retrieved April 11, 2017, from http://christophergandrud.github.io/BasicBayesianPresent/#/what-is-statistical-inference

(2010, April 30). Retrieved April 11, 2017, from https://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_introbayes_sect006.htm

What is statistical power? (2010, June 01). Retrieved April 11, 2017, from https://effectsizefaq.com/2010/05/31/what-is-statistical-power/

--

--