Divestiture Misconceptions Surrounding Maximum Likelihood Estimation Through R

Dr Shikhar Tyagi
10 min readMay 3, 2024

--

Maximum Likelihood Estimation (MLE) is a statistical method used for estimating the parameters of a probability distribution or a statistical model. Its roots trace back to the early 20th century, with foundational contributions from several prominent statisticians and mathematicians.

Origins and Development

  1. Ronald Fisher (1890–1962): Fisher is often credited with formalizing the concept of MLE in his seminal work “Theory of Statistical Estimation” published in 1922. He introduced the likelihood function and demonstrated its properties, including its role as a basis for estimation. Fisher’s work laid the foundation for MLE as a systematic approach to parameter estimation.
  2. Karl Pearson (1857–1936): Pearson, a pioneer in the development of statistical methods, also made significant contributions to the theory of estimation. He introduced the method of moments, which predates MLE and shares similarities with it. Pearson’s work provided insights into the principles of estimation that influenced the development of MLE.
  3. Jerzy Neyman (1894–1981) and Egon Pearson (1895–1980): Neyman and Pearson further developed the theory of estimation, introducing concepts such as confidence intervals and hypothesis testing. While they did not specifically formulate MLE, their contributions to statistical theory provided a broader context for understanding estimation methods.

Key Concepts and Principles

  1. Likelihood Function: The likelihood function represents the probability of observing the data given a specific set of parameter values. MLE seeks to find the parameter values that maximize this likelihood function, making them the most plausible given the observed data.
  2. Optimization: MLE involves optimizing the likelihood function with respect to the parameters of interest. This optimization process can be carried out using numerical methods such as gradient descent or Newton-Raphson algorithms.
  3. Asymptotic Properties: MLE possesses desirable asymptotic properties, such as consistency, efficiency, and asymptotic normality, under certain regularity conditions. These properties make MLE a powerful and widely used estimation method in statistics.

Impact and Applications

MLE has had a profound impact on various fields, including:

  • Biostatistics: Estimating parameters in epidemiological studies and clinical trials.
  • Econometrics: Estimating parameters in economic models and time series analysis.
  • Machine Learning: Parameter estimation in probabilistic models such as Gaussian mixture models and hidden Markov models.

Maximum Likelihood Estimation (MLE) is a cornerstone of statistical modeling, offering straightforward and efficient parameter estimates. However, misconceptions about MLE abound, leading to confusion among learners and practitioners. In this blog post, we’ll debunk these misconceptions and offer a comprehensive understanding of MLE.

Can Likelihood Values be Negative?

Consider a scenario where you’re estimating the parameters of a logistic regression model to predict the probability of a customer clicking on an online advertisement based on demographic information. The likelihood function here measures how well the model’s predicted probabilities match the actual outcomes (click or no click) observed in the data.

Now, suppose your model initially predicts probabilities that are far off from the observed outcomes. In this case, the likelihood function will assign low probabilities to the observed outcomes, resulting in a low likelihood value. Since likelihood measures the goodness of fit, a lower likelihood value indicates a poorer fit of the model to the data.

As you refine your model, the predicted probabilities become closer to the observed outcomes, leading to higher likelihood values. However, during model optimization, it’s possible to encounter instances where the model performs worse than random chance. In such cases, the likelihood function assigns negative values to the observed outcomes, indicating that the model is a worse fit than a random guess. This demonstrates that likelihood values can indeed be negative, reflecting the discrepancy between model predictions and observed outcomes.

Likelihood Values Between 0 and 1

Imagine you’re analyzing the effectiveness of a new drug in a clinical trial. The trial measures whether patients experience a particular side effect after taking the medication. You want to estimate the probability of experiencing the side effect based on the dosage administered.

As you collect data, you fit a logistic regression model to predict the probability of experiencing the side effect based on the dosage level. If the model accurately captures the relationship between dosage and side effects, the likelihood function will assign high probabilities to the observed outcomes, indicating a good fit.

Now, suppose the dosage levels in the trial are such that most patients either experience the side effect or don’t, with very few falling in between. In this case, the observed data strongly aligns with the model’s predictions, resulting in likelihood values close to 1. The likelihood function reflects the high probability of the observed outcomes under the model, indicating a strong fit.

However, it’s important to note that likelihood values can fall between 0 and 1, reflecting the varying degrees of alignment between model predictions and observed outcomes.

Likelihood Values More Than 1

Consider a scenario where you’re estimating the parameters of a probability distribution to model daily temperature fluctuations in a city over several years. You choose a normal distribution to represent the temperature data, assuming it follows a bell-shaped curve.

After fitting the normal distribution to the temperature data, you calculate the likelihood function to measure how well the distribution fits the observed temperatures. If the observed temperatures closely follow the bell-shaped curve predicted by the normal distribution, the likelihood function assigns high probabilities to the observed temperatures, indicating a good fit.

Now, suppose the temperature data exhibits extremely tight clustering around the mean, with very little variability. In this case, the density function of the normal distribution yields high probabilities for each temperature observation, resulting in likelihood values greater than 1 when multiplied across the entire dataset. This occurs because the likelihood function is the product of probabilities, and with highly concentrated data, the product can exceed 1.

By examining these real-life examples, we see how misconceptions about Maximum Likelihood Estimation can arise and impact statistical analysis. Understanding that likelihood values can be negative, fall between 0 and 1, or exceed 1 is crucial for properly interpreting the results of statistical models and making informed decisions based on data.

Take an example in R

install.packages("maxLik")
# Load necessary library
library(maxLik)

# Generate sample data
set.seed(123)
data <- rnorm(100, mean = 10, sd = 2)

# Define the negative log-likelihood function for normal distribution
neg_log_likelihood <- function(params) {
mu <- params[1]
sigma <- params[2]
n <- length(data)
-(n/2) * log(2 * pi * sigma^2) - (1/(2 * sigma^2)) * sum((data - mu)^2)
}

# Maximize the likelihood
mle_result <- maxLik(neg_log_likelihood, start = c(mean(data), sd(data)))

# Print MLE results
summary(mle_result)
Maximum Likelihood estimation
Newton-Raphson maximisation, 2 iterations
Return code 8: successive function values within relative tolerance limit (reltol)
Log-Likelihood: -201.5839
2 free parameters
Estimates:
Estimate Std. error t value Pr(> t)
[1,] 10.1808 0.1816 56.06 <2e-16 ***
[2,] 1.8165 0.1284 14.14 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The code you provided performs Maximum Likelihood Estimation (MLE) for the parameters of a normal distribution using the maxLik package in R. Let's break down the results and provide an explanation:

MLE Results

Log-Likelihood

  • The log-likelihood value represents the maximum log-likelihood achieved by the estimation process.
  • In this case, the log-likelihood is approximately -201.5839.

Parameter Estimates

  • The MLE estimates two parameters: the mean (μ) and the standard deviation (σ) of the normal distribution.
  • For the mean (μ), the estimate is approximately 10.1808 with a standard error of 0.1816.
  • For the standard deviation (σ), the estimate is approximately 1.8165 with a standard error of 0.1284.

Significance

  • The ‘t value’ and ‘Pr(>t)’ columns provide information about the significance of each parameter estimate.
  • The ‘t value’ represents the ratio of the estimated parameter to its standard error.
  • The ‘Pr(>t)’ value indicates the probability of observing such a large t value if the true parameter value were zero (the null hypothesis).
  • In this case, both parameter estimates are highly significant, indicated by the ‘***’ symbols.

Explanation

The MLE results indicate that the estimated mean of the normal distribution is approximately 10.1808, and the estimated standard deviation is approximately 1.8165. These estimates represent the most likely values of the parameters given the observed data.

The log-likelihood value of approximately -201.5839 indicates the goodness of fit of the estimated model to the data. A higher log-likelihood value suggests a better fit of the model to the data.

The significance tests (t values and p-values) confirm that both parameter estimates are statistically significant, meaning that they are unlikely to be zero. This implies that the estimated mean and standard deviation are reliable estimates of the true parameters of the underlying normal distribution.

Histogram

A Histogram is a graphical representation of the distribution of numerical data. It divides the range of values into intervals (bins) and counts the number of observations that fall into each interval. In the context of Maximum Likelihood Estimation (MLE), a histogram can help visualize the distribution of the observed data and assess how well it conforms to the assumed probability distribution.

Interpretation

  • The x-axis represents the range of values observed in the data.
  • The y-axis represents the frequency or density of observations in each interval (bin).
  • The height of each bar represents the number of observations in that interval.
  • The shape of the histogram provides insights into the distribution of the data. For example:
  • Normal Distribution: A bell-shaped curve with a symmetric distribution.
  • Skewed Distribution: A distribution with a long tail on one side.
  • Bimodal Distribution: A distribution with two distinct peaks.
  • The width of the bins can affect the appearance of the histogram. Narrower bins can reveal finer details of the distribution, while wider bins can provide a smoother representation.

In the context of MLE

Model Assessment

  • The histogram allows visual comparison between the observed data distribution and the assumed distribution of the model.
  • If the histogram closely resembles the assumed distribution, it suggests that the model fits the data well.

Model Diagnostics

  • Deviations from the assumed distribution in the histogram can indicate potential shortcomings or violations of assumptions in the model.
  • Outliers or unusual patterns in the histogram may require further investigation to understand their implications for the model.

Contour Plot

A Contour Plot is a graphical representation of a three-dimensional surface on a two-dimensional plane. It visualizes the relationship between two continuous variables and a third variable represented by contour lines. In the context of Maximum Likelihood Estimation (MLE), a contour plot is often used to visualize the likelihood surface, showing how the likelihood of the model varies across different values of the parameters.

Interpretation

  • The x-axis and y-axis represent the two parameters of interest (e.g., mean and standard deviation for a normal distribution).
  • The contour lines represent regions of equal likelihood or log-likelihood values. Each contour line corresponds to a specific likelihood value, with higher contours indicating higher likelihood values.
  • Peaks or valleys in the contour plot indicate regions where the likelihood is maximized or minimized, respectively.
  • The shape of the contour plot provides insights into the curvature and complexity of the likelihood surface.
  • Convex Surface: A smooth, upward-curving surface, indicating a well-behaved likelihood function.
  • Concave Surface: A downward-curving surface, indicating a maximum likelihood estimate (MLE).
  • Irregular Surface: A jagged or irregular surface, indicating potential challenges in estimating the parameters.

In the context of MLE:

Parameter Estimation

  • Contour plots help identify the maximum likelihood estimates (MLE) of the parameters by locating the peak of the likelihood surface.
  • The intersection of contour lines at a single point represents the estimated values of the parameters that maximize the likelihood of the model.

Model Comparison

  • Contour plots allow comparison between different models by visualizing their likelihood surfaces.
  • Models with higher likelihood values and smoother likelihood surfaces are preferred, indicating better fit to the data.

Uncertainty Assessment

  • The shape and spread of the contour lines provide insights into the uncertainty associated with the parameter estimates.
  • Widely spaced contour lines indicate high certainty, while closely spaced contour lines indicate high uncertainty.

Likelihood Profile Plot

A Likelihood Profile Plot illustrates the relationship between the likelihood function and a parameter of interest while holding other parameters fixed at their maximum likelihood estimates. It helps in understanding how changes in the parameter of interest affect the likelihood of the model.

Interpretation

  • The x-axis represents the parameter of interest (e.g., mean for a normal distribution).
  • The y-axis represents the negative log-likelihood (or log-likelihood) of the model.
  • Peaks or valleys in the plot indicate regions where the parameter provides the best fit to the data.

Sensitivity Analysis Plot

A Sensitivity Analysis Plot examines the sensitivity of the likelihood function (or other measures of model fit) to variations in a particular parameter. It helps in understanding how changes in the parameter affect the overall fit of the model.

Interpretation

  • The x-axis represents the parameter of interest (e.g., standard deviation for a normal distribution).
  • The y-axis represents the negative log-likelihood (or log-likelihood) of the model.
  • The plot shows how the likelihood changes as the parameter varies. Peaks or valleys in the plot indicate regions where the parameter provides the best fit to the data.

Predicted vs Observed Plot

A Predicted vs. Observed Plot compares the observed values with the values predicted by the model. It helps in assessing the predictive performance of the model and identifying any systematic discrepancies between observed and predicted values.

Interpretation

  • The x-axis represents the observed values.
  • The y-axis represents the predicted values from the model.
  • Ideally, the points should fall close to the diagonal line (y = x), indicating a good agreement between observed and predicted values. Deviations from the diagonal line suggest discrepancies between the model and the observed data.

--

--

Dr Shikhar Tyagi

Dr. Shikhar Tyagi, Assistant Professor at Christ Deemed to be University, specializes in Probability Theory, Frailty Models, Survival Analysis, and more.